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During  1974-1975,  the  author  spent  his  sabbatical  leave  at 
the  International  Institute  for  Applied  Systems  Analysis, 
Laxenburg,  Austria,  where  he  was  able  to  continue  his 
research  in  credibility  methods  in  a  stimulating  inter¬ 
national  scientific  community. 

Because  of  the  difficulty  of  obtaining  copies  of  research 
memoranda  published  during  that  period,  it  seems  desirable 
to  reproduce  them  in  this  format  for  distribution  to  inter¬ 
ested  colleagues,  sponsors,  and  students.  Naturally,  credit 
for  support  and  initial  distribution  of  this  work  should 
remain  with  IIASA;  two  of  the  papers  have  been  submitted  to 
journals  for  possible  publication. 
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Abstract 


In  classical  credibility  theory,  a  linearized 
Bayesian  forecast  of  the  fair  premium  for  an  individual 
risk  contract  is  made  using  prior  estimates  of  the  col¬ 
lective  fair  premium  and  individual  experience  data. 
However,  collateral  data  from  other  contracts  in  the 
same  portfolio  is  not  used,  in  spite  of  intuitive  feel¬ 
ings  that  this  data  would  contain  additional  evidence 
about  the  quality  of  the  risk  collective  from  which  the 
portfolio  was  drawn.  By  using  a  hierarchical  model, 
one  makes  the  individual  risk  parameters  exchangeable, 
in  the  sense  of  de  Finetti,  and  a  modified  credibility 
formula  is  obtained  which  uses  the  collateral  data  in 
an  intuitively  satisfying  manner.  The  homogeneous  for¬ 
mula  of  Biihlmann  and  Straub  is  obtained  as  a  limiting 
case  when  the  hyperprior  distribution  becomes  "diffuse”. 


0 .  Introduction 

In  the  usual  collective  model  of  risk  theory  [1] ,  the 
random  variables  generated  by  individual  risks  are  assumed  to 
be  independent,  once  the  individual  risk  parameters  are  known. 
However,  a  priori,  only  collective  (portfolio)  statistics  are 
available,  taken  from  a  distribution  which  is  mixed  over  a 
prior  distribution  of  the  parameter.  We  assume  that  unlimited 
statistics  are  available  for  the  collective  as  a  whole,  and  a 
limited  amount  of  experience  (sample)  data  for  individual 
risks  drawn  at  random  from  the  collective. 
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In  classical  credibility  theory,  we  make  a  linearized 
Bayesian  forecast  of  the  next  observation  of  a  particular 
individual  risk,  using  his  experience  data  and  the  statis¬ 
tics  from  the  collective;  the  resulting  formula,  which  has 
been  known  in  various  forms  for  over  fifty  years,  requires 
only  the  individual  sample  mean,  and  the  first  and  second  mo¬ 
ments  from  the  collective. 

If  one  attempts  to  use  collateral  data  from  other  risks 
in  a  credibility  forecast  of  a  certain  individual  risk,  it 
turns  out  that  this  cohort  data  has  zero  weight,  and  is  dis¬ 
carded  in  favor  of  the  assumed-known  collective  statistics. 
This  is  essentially  because  the  various  individual  risk  pa¬ 
rameters  are  assumed  to  be  independent  and  representative 
samples  from  the  prior  distribution. 

This  result  is  disturbing  to  many  analysts,  who  feel 
that  data  from  other  risks  in  the  portfolio  contains  valuable 
collateral  information  about  the  collective.  In  several  of 
their  models,  Buhlmann  and  Straub  [3,4]  argue  that,  since  the 
(mixed)  moments  of  the  collective  must  be  estimated  anyway,  a 
credibility  forecast  should  be  only  in  terms  of  cohort  data. 
They  achieve  a  partial  result  of  this  kind  by  using  a  propor¬ 
tional  function  of  all  experience  data;  this  forces  the  use 
of  cohort  data  into  an  estimate  of  the  collective  mean,  but 
the  second  moment  components  are  still  required.  In  [12] ,  the 
author  describes  a  model  in  which  the  individual  risk  parame¬ 
ters  were  correlated  through  an  "externalities"  model;  the  re¬ 
sulting  formula  uses  both  cohort  sample  data  and  the  first 
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and  second  collective  moments.  In  [18] ,  Taylor  describes  a 
model  in  which  the  "manual  premium"  (collective  mean)  is  it¬ 
self  a  random  variable,  and  also  obtains  a  formula  in  which 
collateral  data  is  used.  Finally,  we  should  mention  that 
similar  arguments  are  advanced  about  the  use  of  cohort 
data  in  the  otherwise  unrelated  "empirical  Bayes"  models  [14, 
16]  . 

In  this  paper,  we  attempt  a  reconciliation  of  these  ap¬ 
proaches,  based  upon  the  ideas  of  hierarchical  models  [13,14, 
15]  and  model  identification  [17,19].  Although  we  obtain  re¬ 
sults  similar  to  those  already  described  in  [12],  the  justi¬ 
fication  is  completely  different,  and,  we  believe,  provides  a 
more  natural  explication  of  the  situations  in  which  collateral 
data  should  be  used. 

1.  The  Basic  Model 

In  the  basic  model  of  the  collective,  we  imagine  that  in¬ 
dividual  risk  contracts  are  characterized  by  a  risk  parameter, 
0,  which  is  drawn  from  a  known  prior  density,  p(0).  A  cohort, 
or  portfolio,  of  such  contracts  consists  of  a  finite  popula¬ 
tion  [0  02  /  •  •  •  f  0J.]  f  whose  members  are  drawn  independently  from 

the  same  density. 

Then,  given  0^,  we  suppose  that  we  have  likelihood  densi¬ 
ties  ,  Pj^  I  0  ^  which  govern  the  generation  of  n^^  independent 


We  adopt  the  usual  convention  that  all  densities  are  in¬ 
dicated  by  p ( . ) ,  the  arguments  indicating  the  appropriate  ran¬ 
dom  variable (s).  The  random  variables,  themselves,  are  indi¬ 
cated  where  necessary  by  a  tilde.  Finally,  to  avoid  complicated 

(continued) 
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and  identical  realizations  of  the  risk  random  variable, 
x^^(t  =  l,2,...,n^).  In  other  words,  from  the  total  portfo¬ 
lio,  we  have  r  individual  experience  data  records,  = 
[x.,,x.«,...,x.  ],  which,  together,  we  refer  to  as  the  total 

JL  ^ 

experience,  X.  Note  that  each  process  is  stationary  over 
time,  but  that  we  (temporarily)  permit  the  individual  risks 
to  have  different  distributions.  In  particular,  we  need  to 
define  the  first  two  conditional  moments: 


Prior  to  the  data,  p(0)  is  the  same  prior  density  for 

any  arbitrary  risk  drawn  from  the  collective;  thus,  a  priori, 

t  h 

we  have  the  following  average  moments  for  risks  of  the  i 
,  •  th  . 

and  j  types: 

m^^  =  =  ^{m^(0^)}  ;  (1.2) 


mj(0j)} 


(0  (i  7^  j) 

(^■{v^(0^)  }  (i  =  j) 

^0  (i  7^  j) 

;  (1.4) 

(<nm^(0^)}  (i  =  j) 


1  (cont  subscripts ,  we  define  the  multiple  conditional 
expectation : 

(a,b,c,)  |b|c} 

as  being  the  expectation  of  f(a,b,c)  using  measure  p(a|b,c), 
followed  by  the  expectation  using  measure  p(b|c),  followed  by 
the  expectation  using  p(c).  Any  of  these  arguments  may  be 
multiple,  and  other  operators,  such  as  variance,  'X',  and  co- 
variance,^,  may  be  used. 
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Note  in  particular  that  there  are  no  covariances  between 

risks  i  and  j  i  for  two  reasons; 

(i)  assumed  independence  between  and  given 

6 .  and  9  .  ; 

1  D 

(ii)  assumed  independence  between  and  0 ^  . 

The  total  prior-to-data  covariance  between  individual  risks 
is  then: 


E. .  +  D, . 

11  11 

(i 

= 

j/t 

=  u) 

D.  . 

11 

(i 

= 

j/t 

^  u) 

0 

(i 

j) 

(1.5) 


The  basic  problem  of  credibility  theory  is  to  forecast 

the  next  observation,  x  . ,  ,  of  a  selected  risk,  s,  given 

s,n  +1  '  -  ^ 

'  s 

the  total  data  from  all  risks,  X  =  [x^|(i  =  l,2,...,r)],  and 
using  the  linear  function; 


n . 

r  X 


f  (X)  =  a  +  E  E  a.  x  , 
i=l  t=l  ■  ^ 


(1.6) 


in  which  the  coefficients  chosen  so  as  to  approx¬ 
imate  the  conditional  mean  <^{x  .,|x}  in  the  least-squares 

s,n  +1 '  ^ 

s 

sense,  over  all  prior  possible  data  records,  p(X). 

The  appropriate  least-squares  formulae  have  been  presen¬ 
ted  elsewhere  (see,  e.g.,  [7,12]).  It  turns  out,  for  the  basic 
model  described  above,  that; 

(i)  a^^  ^  ”  l,2,...,r)(t  =  l,2,...,n^)  because 

of  the  stationarity  assumption; 

(ii)  a.  =  0  (i  ^  0,s)  because  D  .  =  0  (j  s)  ,  that 
1  S3 

is,  9.  and  0  are  independent. 

1  s 
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Defining  the  credibility  factor,  Z^,  and  time  constant, 

N^,  as; 


Z.  =  n./(n.  +  N.)  ;  N.  =  E. ./D. .  ; 


(1.7) 


.  th 


and  the  i  experience  sample  mean,  x^,  as: 


1 

X.  =  —  E  X.,  , 

1  n .  .  ,  it  ' 

1  t=l 


(1.8) 


we  obtain  the  final  credibility  forecast  as: 


f^(X)  =  (1  -  Z^)m^  +  +  0(X.^^_^) 


(1.9) 


Various  interesting  interpretations  of  this  classical  result 
are  possible  [7,8,12],  and  it  is  known  that  (1.9)  is,  in 
fact,  the  exact  Bayesian  conditional  mean  for  a  large  and 
important  class  of  prior  and  likelihood  densities  [9,10]. 


2 .  Objections  and  Previous  Results 

Two  practical  objections  to  the  result  (1.9)  seem  to  be 

raised  in  the  literature.  The  first  is  that  three  prior- 

to-data  moments,  m  ,  E  ,  and  D  ,  must  be  estimated  from  the 

s  ss  ss 

collective  for  each  risk  which  is  forecast.  Even  in  the  more 

usual,  identical-risk  case,  where  m^  =  m,  Ej^j^  =  E,  and  =  D, 

for  all  samples  i  =  l,2,...,r,  (1.9)  provides  no  assistance 

in  estimating  the  common  moments .  This  concern  is  related  to 

the  second  objection,  namely,  that  there  ought  to  be  some  use 

for  the  cohort  data,  {x.  ,  . },  since  it  is  precisely  from  this 

s  ^  t 

data  that  one  would  attempt  to  form  estimates  of  the  first  and 
second  moments  in  actual  practice.  This  collateral  data  ought. 
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then,  to  be  used  either  to  form  initial  estimates  of  m,  E, 
and  D,  or,  in  the  case  in  which  one  had  vague  prior  estimates 
of  them,  to  somehow  revise  them  as  more  portfolio-wide  data 
becomes  available.  Notice  that  we  are  not  talking  about  any 
problems  of  non-stationarity ,  such  as  inflation,  or  shifts  in 
the  risk  environment,  but  just  the  vague  notion  that  our  col¬ 
lective  might,  in  some  way,  be  different  from  the  initially- 
assumed  statistics. 

Biihlmann  and  Straub  [3]  were  the  first  to  point  out  that 
one  can  force  all  the  data  in  X  to  be  used  by  setting  a^  in 
(1.6)  equal  to  zero,  and  constraining  the  remaining  coeffi¬ 
cients  to  give  a  forecast  which  is  unbiased,  as  in  (1.9).  For 
the  simple  model  of  the  last  section,  in  which  the  are 

not  identically  distributed,  we  obtain: 


+  Z  X 


s  s 


(2.1) 


The  term  in  braces,  which  used  all  the  sample  data,  even  that 

of  risk  s,  is  a  substitute  for  m  in  (1.9);  however,  there  is 

s 

no  simplification  as  far  as  collective  moments  to  be  estimated 
are  concerned,  since  all  the  m^,  and  are  used. 

But  in  the  important  case  where  all  risks  are  assumed  to 
be  identically  distributed,  for  the  same  value  of  6, 

(2.1)  simplifies  to; 
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and  now  the  forecast  depends  upon  =  n^/(n^  +  N) ,  with 
N  =  E/D  as  a  ratio  between  variance  components  which  must  be 
estimated  from  the  collective.  Of  course,  the  forecast  (2.2) 
must  give  a  higher  value  to  the  mean-square  error  which  was 
used  to  find  (1.9). 

If  all  data  records  are  of  the  same  length,  n^  =  n  and 
Z.  =  Z  =  n/(n  +  N) ,  (i  =  l,2,...,r),  the  surrogate  for  m  in 

1  o 

the  braces  in  (2.2)  becomes  simply; 


r  r  n 

E  x./j^  =  ^  ^  x../rn  ,  (2.3) 

i=l  ^  i=l  t=l 


the  grand  sample  mean  of  all  cohort  data! 

In  some  work  on  "related  risk"  models  [12] ,  the  author 
assumed  a  situation  in  which  the  risk  parameters  £  = 

[01, ©2 , . . . , 6r^  are  statistically  dependent ,  with  known  joint 
prior.  The  only  effect  of  this  assumption  is  to  introduce 
non-zero  terms  into  the  last  line  of  (1.4),  viz.: 

D..  =<^{m.  (0.);  m.  (0.)}  (2.4) 

IJ  XX  J  J 

for  all  i,j.  If  the  underlying  risk  likelihoods  are  different, 
then  a  multidimensional  credibility  model  [7,11]  must  be  used 
with  an  r  X  r  system  of  equations  solved  to  find  a  matrix  of 
credibility  factors.  However,  in  the  important  special  case 
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where  the  risks  are  identically  distributed,  given 
p(0)  consists  of  exchangeable  random  variables,  and  there  are 
only  four  collective  moments,  m,  E,  and,  say,  and  0^2 

the  cases  in  which  i  =  j  and  i  5^  j ,  respectively,  in  (2.4). 
One  may  easily  show  that,  with  this  correlation  between  risk 


parameters  added,  (1.9)  becomes: 


f^(X)  =  (1  -  Z  ) 
s  s 


(D^l  -  +  Di2  iil 


•'>11  -  “12'  ^  J,  "j 
]=1 


+  Z  X  , 
s  s 


(2.5) 


where  the  credibility  factors  now  require  a  modified  correla¬ 
tion  time  constant, 


Z . 

1 


nj/(n. 


”12’ 


'12  = 


(2.6) 


As  in  (2.2),  the  expression  in  braces  in  (2.5)  is  an  estimate 

for  the  mean  m  ,  which  can  be  seen  to  be  different  from  m, 
s 

because  of  the  non-representative  way  in  which  the  cohort  of  r 
risks  may  have  been  selected.  As  the  correlation  between  the 
parameters  vanishes,  0^2  (2.5)  reduces  to 

the  usual  formula  (1.9),  with  all  the  collateral  data  being 
thrown  away. 

Although  this  model  is  satisfactory  from  the  mathematical 
point  of  view  of  explaining  when  cohort  data  would  be  used 
in  a  linear  forecast,  it  does  not  show  why  there  could 
be  correlation  in  the  collective,  why  the  risk  parameters  should 
be  exchangeable  random  variables,  and  under  what  conditions  this 
correlation  would  be  weak  or  strong.  For  this  purpose,  we  need 
to  extend  the  traditional  model  of  the  collective  into  a  hier¬ 


archical  model. 
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3 .  A  Hierarchical  Model 

In  our  expanded  model,  the  concepts  of  individual  risk 
random  variables,  risk  parameters,  and  a  cohort  of  risks  chosen 
from  a  collective  are  retained,  but  we  imagine  that  our  collec¬ 
tive,  the  one  under  study,  is  not  necessarily  representative  of 
other  possible  collectives  which  are  drawn  from  some  larger 
universe  of  collectives. 

Formally,  this  means  that  there  is  a  collective  selection 
hyperpar ameter ,  which  describes  how  possible  collectives  may 

vary  from  one  another,  when  chosen  from  some  hyperprior  density 
p(<^).  Once  'P  is  chosen  and  the  collective  characteristics  are 
defined,  then  the  risk  parameters  [0^]  are  chosen  for  each 
of  the  r  members  of  our  cohort,  independently,  and  identically 
distributed  from  a  prior  density  p(0|'p).  Finally,  the  n^^ 
experience  samples  for  each  individual  risk  i  are  drawn  inde¬ 
pendently  from  a  likelihood  ,  p^  |  0^  , -p)  .  Notice  that  the 
risk  parameters  and  the  individual  risks  are  now  independent 
only  if  is  given;  from  the  prior-to-selection-of -collective 
point  of  view,  there  is  apparent  correlation  between  cohort 
results  because  of  the  mixing  on 

This  somewhat  abstract  model  has  a  very  practical  inter¬ 
pretation-  Imagine  an  insurance  company  in  which  the  individ¬ 
ual  risk  is  an  individual  insurance  contract,  and  the  collec¬ 
tive  is  just  a  portfolio  of  similar  coverages  within  our  com¬ 
pany.  It  is  well  recognized  that  portfolios  vary  from  company 
to  company,  depending  upon  sales  strategy,  available  customers, 
local  risk  conditions,  etc.;  our  portfolio  may  be  better  or 
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worse,  than,  say,  the  nationwide  average.  The  universe  of 
collectives,  then,  corresponds  to  the  union  of  all  possible 
risk  contracts  of  this  type  in  the  nation,  for  which  we  may 
assume  adequate  statistics  are  available.  Thus,  in  a  hier¬ 
archical  model,  we  hope  to  use  nationwide  statistics,  together 
with  all  the  data  from  our  portfolio,  not  only  to  predict  next 
year's  fair  premium  for  individual  risks,  but  also  to  draw 
inferences  about  what  kind  of  a  portfolio  we  have. 

For  the  development  of  a  least-squares  forecast,  we  start 
with  the  individual  risk  moments  of  p  |  6^, : 

m^^Ce^,^)  =  ^{x^^l0^,V’}  ;  v^(e^,</’)  =  T'{x^^|  0^,  ,  (3.1) 


and,  from  the  usual  conditional  arguments,  form  the  universal- 
average  mean  of  the  i^^  type; 

=  <#{x^^}  =  <f^{m^(0^,^’)  |0^1^}  .  (3.2) 

The  universal  covariances,  using  the  conditional  independence 
properties  described  above,  are: 


F  .  .  +  G  .  . 

+  H.  . 

(i 

j  /  t 

_ 

u) 

11  11 

11 

G.  . 

+  H.  . 

(i 

j  /  t 

U) 

11 

11 

H.  . 

(i 

7^ 

j) 

r 

ID 

where 


=  ^^{V^(0^,«^)  |0^|^}  , 


=  (fX{m^(0^,^)  |0^|^}  , 


(3.4) 

(3.5) 
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and 


H.  .  =  ce 

ID  I  1 


1 


(0  j  ,^)  I  <P} 


(3.6) 


Several  remarks  are  in  order.  From  one  point  of  view, 
what  we  have  done  is  to  introduce  correlation  between  risk 
parameters  of  members  of  the  same  collective,  for  on  comparing 
the  above  with  Cl- 5)  as  modified  by  (2.4),  we  get  the  formal 
equivalences: 


11 


E  F.  . 
11 


D. 


11 


11 


+  H . 


11 


D 


ID 


E  H, 


ID 


(i  j) 


(3.7) 


However,  the  interpretation  is  completely  different,  as  we 
have  seen. 

The  second  observation  is  that  is  might  seem  worth  while 
to  decouple  the  from  and  make  the  likelihood  only  depen¬ 

dent  upon  0^;  this  might  simplify  some  of  the  computations 
above,  but  does  not  diminish  the  number  of  individual  prior- 
to-selection-of -collective  moments  needed. 

However,  in  the  important  special  case  where  the  individ¬ 
ual  risk  contracts  are  similar,  giving  identical  likelihoods, 
given  0^^  and  ,  it  can  be  seen  that  only  four  moments 
remain:  M,  F,  G,  and  H.  These  may  be  interpreted  in  terms 

of  our  simpler  model  by  noticing  that  it  is  as  if  the  moments 
of  Section  1  had  a  hidden  dependence  upon  an  unknown  parameter 
•P.  Calling  those  moments,  then,  m(v’),  £(¥>),  and  D(v’),  we  see 
that  the  universal  moments  are  equivalent  to: 


M  =  <^m(^)  ;  ST  =  SE{<P)  ;  G  =#D(^)  ;  H  ='rm(^)  . 

(3.8) 
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In  other  words,  M,  F,  and  G  are  uniyerse-averaged  versions  of 
our  previous  m,  E,  and  D.  H,  however,  is  new,  and  represents 
the  variance  of  the  fair  premium  over  all  possible  collectives. 


4 .  Universal  Forecasts 

Continuing  with  the  important  special  case  of  identical 
risk  distributions,  it  follows  easily  from  least-squares  theory 
and  the  above  definitions  that  the  optimal  credibility  forecast 
for  the  hierarchical  model  is: 


f,<X) 


(1  - 


r 

GM  +  H.E,  Z 
1=1 


G  + 


H.E, 

1=?1 


+  Z  X 


s  s 


(4.1) 


where  now  a  new  universal  time  constant,  N^,  appears  in  the 
credibility  factors: 

Ny  =  F/G  ;  =  n^/(n^  +  N^)  .  (4.2) 

Alternatively,  we  can  get  (4.1)  from  (2.5)  and  (3.7). 

Following  an  idea  of  Taylor  for  his  model  [18],  we  note 
that  (4.1)  can  be  split  into  two  parts: 

f  (X)  =  (1  -  Z^)  M(X)  +  Z  ^  ;  (4.3) 

o  o  So 

r  _ 

GM  +  H.E  Z.x. 

M(X)  =  - ^  .  (4.4) 

G  +  H  .Z,  Z  . 

3=1  3 

The  second  formula  may  be  regarded  as  a  revision  of  the  "prior 
expected  manual  premium" ,  M,  using  the  experience  data  of  all 
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members  of  the  cohort  to  obtain  an  "adjusted  manual  premium", 
A(X).  This  revised  manual  premium  is  then  used  in  an  ordinary 
credibility  formula  with  the  appropriate  individual  credibility 
factor,  Zg,  for  the  forecast  risk  s. 

The  credibility  revision  of  the  universal  mean  (4.4)  de¬ 
pends  in  a  complicated  manner  upon  the  amount  of  data  from 
each  risk.  However,  if  all  data  records  are  of  the  same  length 
n,  then  =  Z  =  n/Cn  +  Ny)  for  all  i,  and  (4.4)  can  be  re¬ 
written: 

1  r  _ 

M(X)  =  (1  -  Z^)M  +  x^)  ,  (4.5) 

where  the  collective  credibility  factor,  Z^,  is: 


rnH 

_  "  rH  1 

n 

F  +  nG  +  rnH 

_G  +  rH_ 

_n  +  (F/G  +  rH) ) _ 

(4.6) 

If  rH  is  large  compared  to  G,  this  function  increases  at  first 
more  rapidly  than  the  common  individual  credibility  factor  Z , 
as  n  increases;  however,  Z^  has  an  asymptotic  limit  less  than 
unity,  so  that  (4.5)  is  not  a  credibility  formula  in  the  usual 
sense  ;  that  is,  the  grand  sample  mean  is  not  ultimately  "fully 
credible"  for  m(v’). . 

This  puzzling  result  can  be  explained  by  remembering  that 
the  risk  parameters  of  the  cohort  i  =  l,2,...,r],  once 

picked,  remain  the  same  for  all  n.  Therefore,  if  one  estimates 
a  fair  premium  for  an  arbitrary  new  member  of  the  portfolio, 
say,  with  risk  parameter  then  there  remains  the  possibil¬ 

ity  that  the  cohort  sample  is  biased.  Thus  Z^  does  not  approach 
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unity  with  increasing  n,  unless  rH  >>  G,  which  means  that  a 

large  enough  portfolio  contains  a  representative  sample  of 

risk  parameters.  This  effect  is  not  important  in  our  estimate 

of  X  ,  because  of  the  factor  (1  -  Z  )  in  (4.1) . 

S  f  n"r  X  s 

If,  on  the  other  hand,  we  did  wish  to  estimate  the  fair 
premium  averaged  over  the  current  portfolio: 


1 

r 


r 

E 

i=l 


X  . 

i,n+l 


then  one  can  show  that  (4.5)  is  still  correct  if  a  different 
credibility  factor. 


Zq  =  (nG  +  rnH)/(F  +  nG  +  rnH)  ,  (4.7) 

is  used;  this  does  approach  unity  with  increasing  n. 

5 .  Limiting  Cases 

The  time  constant  =  F/G  is  just  the  universe-average 
version  of  the  classical  Biihlmann  time  constant  N  =  E/D,  so 
that  (4.3)  is  in  a  certain  sense  similar  to  (1.9).  However, 
the  factor  H  =0^m(<p)  is  completely  new,  and  it  is  interesting 
to  examine  limiting  cases. 

If  H  O,  then  we  may  say  that  all  collectives  are  repre¬ 
sentative  samples  from  the  rather  narrow  universe  of  collectives 
in  which  there  is  little  variance  in  fair  premium.  Thus,  M  ^  m, 
G  ->■  D,  Ny  -»■  N,  and  Z^  O.  No  updating  of  the  fair  premium  is 
necessary  from  the  collateral  data,  and  (4. 3) -(4. 4)  reduce  to 
the  classical  model  (1.9). 
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On  the  other  hand,  if  H  -»■  this  means  that  collectives 
are  drastically  different  from  one  another,  or  in  Bayesian 
language,  we  have  a  ''diffuse  prior"  on  mC^)  •  Then  from  (4.4) 
or  (4.6),  we  see  that,  whenever  there  is  cohort  data,  it  is 
"fully  credible"  for  m(^),  and  (4.1)  reduces  to  the  Biihlmann- 
Straub  proportional  forecast  (2.2)! 

The  same  effect  occurs  in  (4.6)  as  r  ^  but  for  a  dif¬ 
ferent  reason;  the  grand  sample  mean  of  X  is  almost  surely 
the  correct  mean,  m(V’)  ,  for  our  collective,  and  thus  M  is 
eliminated. 


6 .  Approximation  Error 

The  value  of  any  forecast  must  be  judged  in  terms  of  the 
mean-square  error; 


I 


[X, 


,n^+l 

s 


-  f.(X)] 


(6.1) 


A  certain  portion  of  this  error  is  due  to  individual  fluctua¬ 
tion,  and  cannot  be  removed;  the  remainder  is  essentially  an 
approximation  error  between  the  chosen  forecast  and  the  optimal 

Bayesian  forecast,  (See,  e.g .  ,[12]  • )  We  now 

^s 

examine  the  mean-square  error  for  several  of  the  forecasts 
suggested  previously. 

The  first  and  simplest  possibility  is  to  take  the  univer¬ 
sal  mean,  fg (X)  =  M,  as  an  estimator.  Then: 

=  F  +  G  +  H  ,  (6.2) 


that  is,  no  component  of  variance  is  removed. 
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The  second  possibility,  suggested  by  the  surrogate  for 
the  collective  mean  in  (2.21,  is  to  take  the  credibility’- 
weighted  mean  of  all  cohort  data,  f  (X)  =  SZ.x./^Z.,  giving: 


l2  =  F  +  G 


1  + 


(.1  -  2Z^) 

EZ  . 

D 


(6.3) 


which  removes  the  fluctuation  component  H,  but  may  increase 

the  second  term  for  Z  <  § . 

s 

A  third  collective-wide  possibility  which  has  already 
been  justified  is  the  "adjusted  manual  premium",  M(X),  in  (4.4), 
for  which: 


I3  =  F  +  G  +  H 


G(1  -  2Z  ) 
s 

G  +  HEZ  . 

3 


(6.4) 


Turning  now  to  forecasts  which  use  the  data  from  the  in¬ 
dividual  risk  in  a  special  way,  we  could  use  the  Buhlmann-Straub 
homogenous  formula  (2.2),  giving: 


=  F  +  G(1  -  Z^) 


1  + 


(1  -  Z^) 
EZ  . 


(6.5) 


Also  of  interest  would  be  an  individual  forecast  in  which 
the  cohort  data  is  ignored,  (1.9) : 


=  F  +  G(1  -  Z  )  +  H(1  -  Z^) 

D  S  S 


(6.6) 


Finally,  we  have  the  variance  when  the  optimal  universal 
forecast  (4.1)  is  used: 


I,  =  F  +  GCl  -  Z  )  +  H 
6  s 


G  +  HEZ . 

3 


(1  -  7.)^  .  (6.7) 

i>Z> 
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Notice  that  none  of  the  forecasts  removes  F;  this  is  the 
irreducible  variance  component.  Comparison  of  different  fore¬ 
casts  depends  in  general  upon  the  values  of  G,  H,  and  the 
credibility  factors;  for  example,  one  cannot  say  that  I2  is 
uniformly  better  than 

The  following  relationships  do  hold,  however,  for  all 
values  of  the  coefficients: 


I 

I 

I 


6 

6 

6 


< 


< 


< 


This  effectively  removes  and  from  the  second-rank  con¬ 
tenders,  after  the  optimal  forecast  Ig. 

The  Biihlmann-Straub  formula,  would  seem  to  have 

special  appeal  because  of  the  fact  that  H  is  removed  completely. 
However,  Ig  <  always;  and  when  H  ->-  <»,  Ig  approaches  a  finite 
limit  as  well.  Conversely,  the  classical  individual  credibility 
mean-square  error,  Ig,  continues  to  increase  as  the  universal 
prior  becomes  more  diffuse,  and  this  is  the  basic  justification 
for  including  the  cohort  data. 

7 .  Normal  Hierarchical  Family 

A  special  case  of  interest  is  when  all  densities  discussed 
in  Section  3  are  normal.  If  N(a,b)  refers  to  the  normal  density 
with  mean  a  and  variance  b,  then  by  setting: 

p(Xit|ei,v’)  =  N(0^,F)  ;  p{e^|v>)  =  N(V?,G)  ;  p(V’)  =  N(M,H)  , 

(7.1) 
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we  find  that  the  universal  forecast  (4,1)  is  exactly  the 

Bayesian  conditional  mean  #{x  ,  .  Ix}. 

^  s  n  +T  ' 

si 

Further,  the  adjusted  manual  premium,  M(X)  (4.4),  is 
(f{^\x} .  The  joint  distribution  p(£|x),  as  well  as  p((|)|x),  are 
both  normal,  and  their  precision  matrices  may  be  found  by 
elementary  calculations. 

8 .  Related  Work 

A  linear  Bayesian  model  which  is  hierarchical  in  form 
has  been  given  by  Bindley  and  Smith  [13,14,15].  In  this  model, 
X,  and  ^  are  random  vectors  for  which  <#{x|^,^}  =  and 

f  "^2  matrices  of  appropriate  dimension 

The  underlying  distributions  are  all  assumed  to  be  multinormal, 
with  and  the  covariances  assumed  to  be  known  constants. 

When  specialized  to  our  model,  results  similar  to  Section  7 
are  obtained. 

In  [18],  Taylor  develops  a  credibility  model  in  which  the 
"manual  premium",  m,  is  revised  according  to  "the  average 
actual  claim  amount  per  unit  risk  in  the  entire  collective  in 
the  year  of  experience".  His  assumptions  are  different  from 
ours,  in  that  m  "has  a  prior  distribution  at  the  beginning  of 
the  year  of  experience",  but  "for  fixed  m,  each  m(0^)  is  fixed" 
(in  our  notation) .  I  interpret  this  as  saying,  in  effect,  that 
there  is  a  hidden  parameter,  <P,  which  is  still  left  in  m  =  m('p) 
after  averaging  over  the  0^.  However,  I  have  been  unable  to 
further  relate  the  two  models,  and  his  formulae  have  the  dis¬ 
advantage  that,  as  "the  prior  distribution  on  m"  becomes 
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degenerate,  his  forecast  does  not  reduce  to  the  usual  credi¬ 
bility  formula. 


9 .  Conclusion 

In  conclusion,  we  mention  that  our  hierarchical  model 
implies  that  the  joint  distribution  of  the  risk  parameters  at 
the  level  of  the  insurance  company  is: 


p  { 0 ). 


f  r 

IT  p  (0  .  I  V’)  p  (<p)  dv’  , 
J  i=l  ^ 


which  is  equivalent  to  assuming  that  the  risk  parameters  are 
exchangeable  random  variables.  This  powerful  concept,  due  to 
de  Finetti  [5,6],  is  a  natural  modelling  assumption  for  prob¬ 
lems  in  which  a  random  sample  generates  a  finite  population 
whose  members  are  distinguishable  only  by  their  indices,  as 
in  our  selection  of  a  portfolio  from  an  abstract  collective. 
[14],  Section  6,  and  [15]  contain  further  discussions  of  the 
applicability  of  exchangeability.  In  a  certain  sense,  what 
our  model  does  is  to  use  exchangeability  to  introduce  correla¬ 
tion  among  the  cohort  9^^^,  in  the  same  way  that  a  Bayesian  prior 
introduces  correlation  among  successive  individual  samples .  In 
both  cases,  this  prior  correlation  vanishes  as  the  actual 
values  of  <P  and  ^  become  identified. 

G.  Ferrara  once  asked  how  credibility  experience  rating 
could  be  used  in  a  company  where  there  are  no  prior  statistics. 
By  referring  the  prior  estimation  problem  to  a  higher  level  of 
data  collection,  and  by  using  all  the  experience  data  generated 
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by  the  company’s  contracts  as  one  learns  about  the  actual 
portfolio  quality,  vre  believe  that  the  model  developed  here 
goes  a  long  way  towards  answering  this  question. 
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Abstract 


The  development  of  a  Bayesian  theory  of  regression 
requires  special  distributional  assumptions  and  rather 
complicated  calculations.  In  this  paper,  general  formulae 
for  predicting  the  mean  values  of  the  regression  coeffi¬ 
cients  and  the  mean  outcomes  of  future  experiments  are 
developed  using  the  methods  of  credibility  theory,  a  lin¬ 
earized  Bayesian  analysis  originally  used  in  actuarial 
problems.  No  special  distributional  assumptions  on  prior 
or  error  distributions  are  needed,  and  heteroscedastic 
errors  in  both  the  dependent  and  independent  variables  are 
permitted.  The  first  group  of  formulae  hold  for  arbitrary 
design  matrices  and  dimensionality  of  input,  since,  as 
common  in  Bayesian  methods,  there  are  none  of  the  usual 
problems  of  identif lability .  However,  in  the  event  that 
the  design  matrix  has  full  rank,  the  credibility  results 
are  equivalent  to  a  linear  mixture  of  the  prior  mean  pre¬ 
diction  and  the  classical  (generalized)  least-squares 
regression  predictor;  thus,  the  credibility  result  provides 
a  bridge  between  full  Bayesian  methods  and  classical 
estimators.  One  can  also  find  easily  the  preposterior  co- 
variance  matrix  for  the  credibility  estimators,  and  it  is 
shown  that  prior  information  and  the  results  from  prior 
experiments  can  be  cascaded  in  a  particularly  intuitive 
manner.  Many  special  applications  of  the  credibility 
formulae  are  possible  because  of  the  generality  of  the 
assumptions . 
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Bayesian  Regression  and  Credibility  Theory 


William  S.  Jewell 


Introduction 


Regression  theory  plays  a  fundamental  role  in  statistical 
model-building,  parameter  estimation,  and  forecasting.  In 
recent  years,  the  need  to  incorporate  prior  information  into 
these  models  has  stimulated  the  development  of  Bayesian  method 
of  regression  analysis,  particularly  in  the  field  of  economet¬ 
rics  [8,20,21,22,24,32].  However,  the  resulting  formulae  are 
usually  complex,  and  require  quite  stringent  assumptions  on 
the  error  likelihoods  and  on  the  prior  distributions  of  param¬ 
eters  . 

Credibility  theory,  which  was  developed  for  a  variety  of 
simple  predictive  problems  in  insurance  [4,5,12,13,14,15,17], 
is  a  linearized  Bayesian  method  for  forecasting  mean  values 
which  circumvents  many  of  the  difficulties  of  a  full  Bayesian 
analysis;  furthermore,  in  many  cases  of  practical  interest, 
the  simplified  formulae  are  also  exact.  In  this  paper,  which 
was  stimulated  by  the  initial  work  of  Hachemeister  and  Taylor 
[10,25],  we  apply  credibility  ideas  to  the  full  range  of 
Bayesian  regression  models. 


-2- 


1.  Classical  Multiple  Regression 

In  the  classical  model  of  linear  normal  iflultiple  regression 
[8,23],  we  ass\ime  that  an  n^l  random  vector  of  observable 
output  variables ,  y,  satisfies  the  linear  model 

Y  =  X3  +  u  (1-1) 


where  X  is  a  known  nxk  matrix  of  observations  on  k  independent 
variables,  called  the  data  or  design  matrix,  3  is  a  kxl  vector 
of  unknown  regression  coefficients ,  and  u  is  an  nxl  random 
vector  of  unobservable  error  variables.  If  we  assume  that  u 
is  multinormally  distributed,  with  zero  mean  and  known  co- 
variance  matrix  C, 


^{u;u}  =  ^{y;y}  =‘X'{y}  =  C  , 


(1.2) 


then  it  is  well  known  that  the  ordinary  least-squares  estimator 
of  3  from  the  n  observations  y  =  y,  with  design  matrix  X  and 
covariance  matrix  C,  is  given  by 

3(y)  =  (x'c'^x)”^  X'c"^  y  .  (1.3) 


In  particular,  if  one  makes  the  assumption  that  C  is 
diagonal,  with  common  terms,  then  (1.3)  has  the  simpler  form 

3  =  (X'X)  X'y  ,  and  the  common  error  variance  need  not  be 

known.  Many  other  classical  results  are  available  based  upon 
the  normality  assumption  (see,  e.g.,  [8,22,23]). 


*We  define  the  (possibly  non-square  and  unsymmetric)  covar¬ 
iance  matrix  , 


^{w;y}  =  -  <^{w}  <^{y'}  , 


for  any  two  conformable  random  vectors  or  scalars  w  and  y ,  and 
write  <g^{y;y}  -  ^{y}  ,  which  is  usually  called  the  covariance 
matrix. 
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2 .  Bayesian  Multiple  Regression 


For  a  full  Bayesian  analysis,  it  is  convenient  to  replace 
(1.1)  by  an  equivalent  model  in  which  the  expected  values  of 
the  outputs  are  linear  functions  of  the  known  inputs,  viz. 


^{yje}  =  X3(0) 


(2.1) 


Here  9  denotes  an  unknown  parameter  which  controls  all  the 
parameters  of  the  conditional  density,  or  likelihood,  of  y, 
given  0,  denoted  by  p(yl0).  The  conditional  covariance  of  y, 
given  0,  will  be  taken  as  an  arbitrary  symmetric  nxn  matrix 


^■{yle}  =  ^(9)  .  (2.2) 

Given  the  fixed,  but  unknown,  parameters  [3(9) /I{9) /•••]/ 
we  assume  in  Bayesian  analysis  that  a  -prior  density,  p(9),  or 
what  is  the  same  thing,  a  joint  prior  density,  p(3,^,...),  is 
available.  Then,  a  priori  (i.e.  prior  to  data) ,  we  define 
the  first  two  moments  of  the  vector  of  regression  coefficients 
as 


<f3(9)  =  b  ;  '^'3(9)  =  A 


(2.3) 


and  the  prior  expected  value  of  the  covariance  matrix  as 


(9)  =  <myl9}  =  E  .* 


(2.4) 


From  these  definitions,  we  can  also  obtain  the  prior  first  two 
moments  of  the  output  variables,  given  X.  From  (2.2),  the  mean 
and  covariance  of  the  conditional  mean  output  are 


<^{y}  =  S’S’iylQ}  =  Xb 


(2.5) 


and 


*We  use  the  convention  that  a  multiple  conditional  expecta¬ 
tion 


<f<^^{f  (a,B,c)  IBjc} 

means  the  expectation  of  f  first  with  respect  to  p(a|b,c) , 
followed  by  expectation  with  respect  to  p(b|c) ,  then  using  p(c) . 
Arguments  may  be  multiple,  and  other  operators,  such  as  9^  and 
may  be  used.  If  the  order  is  unimportant,  and  only ^ operators 
are  used,  the  above  is,  of  course,  <#{f(a,B,c)}  . 
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<r<^{y\e}  =  D  =  xAx’ 


(2.6) 


From  the  covariance  of  the  mean  and  the  mean  covariance,  we 
obtain  the  total  covariance  (1.2)  of  the  output  variables 
prior  to  data  as 


<X{y}  =  C  =  E  +  D  =  E  +  XAX' 


(2.7) 


If  multinormal  and  related  densities  are  used  for  p(y|0)  and 
p(6),  these  are  the  only  moments  of  interest. 

Now,  suppose  an  n^^-dimensional  experiment  is  run  with 
design  matrix  X^^,  resulting  in  a  vector  of  outputs,  y  = 
we  denote  this  by  (n^^  ,Xj^  ,yj^)  .  Using  the  likelihood 
P(y2l®)  ”  P(y3^l0»X^),  and  the  prior  on  the  parameters,  p(0), 
we  obtain  the  posterior  (to  the  data)  density  p(0|yj^)  =  P(0|y2^/Xj^) 
in  the  usual  way : 


p(0  iy^) 


p(y3^|  0)p(0) 
p(y3^1^)p(<^)d'^ 


(2.8) 


where,  for  convenience,  we  suppress  the  known  design  matrix, 

^r 

From  (2.8),  the  updated  estimates  of  the  parameters  3(0)/ 

J  (6 ),...,  are,  in  principle,  available.  For  example,  the  ex¬ 
pected  value  of  the  vector  of  regression  coefficients  posterior 
to  the  data  is 


<^(3(0)  ly^} 


' 

3(0)p(0lyi)d0 


(2.9) 


and  the  predictive  density  for  a  future  experiment  (n2/X2fy2) / 
with  the  same  parameters,  but  independent  outputs,  is 

P(y2lYi)  =  P(y2|yi^Xl^X2^  "  |p(y2|0/X2)p(0ly3^)d0  .  (2.10) 

Because  of  the  difficulty  of  carrying  out  (2.8)- (2.10) 
for  arbitrary  priors  and  likelihoods,  most  of  the  Bayesian 
regression  literature  makes  the  following  additional  assump¬ 
tions  ; 


(1)  The  likelihood,  pCyje)  =  p(y|e,X),  is  multinomial  for 
any  experiment  (n,X,y) — thus  only  the  parameters 

6  =  3(0!  and  E  =  E(,0)  are  involved,  and  (2.8)  can  be 
restated  in  terms  of  p(3f^); 

(2)  Either  the  Ando-Kaufmann  [1]  Normal-Wishart  natural- 
conjugate  prior  p(8,E)  is  used  to  simplify  the  up¬ 
dating  in  (2.8); 

(3)  Or,  3  and  E  are  assumed  independent,  p(3,2)  =,p(3)p(^)/ 
and  simple  marginal  densities  are  chosen,  typically 
multinormal  or  non- informative  (diffuse)^  for  3,  ana 
inverse  Wishart  or  non- informative  for  E. 

There  are  difficulties  with  all  of  these  assumptions. 

For  example,  the  Ando-Kaufmann  prior  is  well  known  to  be  "thin" • 
that  is,  not  all  possible  hyperparameters  in  p(3,E)  can  be 
specified  independently.  And  analysts  are  divided  over  the 
use  of  non- in formative  priors,  although  in  some  cases  they 
follow  from  invariance  or  limiting  arguments  ([32],  p.  226). 

Also,  computations  made  under  these  assumptions  are  dis¬ 
tinctly  untidy,  involving  much  completion  of  the  square,  matrix 
manipulation,  and  multidimensional  integration,  particularly 
if  the  full  posterior  parameter  density,  p(3/^jyQ^)/  and  its 
marginals  are  desired,  or  if  the  predictive  density  (2.10)  is 
sought  [21,30,32].  The  only  non-trivial  relaxations  of  the 
normality  assumption  of  which  we  are  aware  are  the  numerical 
trials  of  Box  and  Tiao  ([3],  Chapter  3)  with  the  exponential 
power  distribution. 

In  the  sequel,  we  propose  to  follow  a  more  modest  course, 
by  concentrating  on  (2.9)  and  the  related  problem  of  predicting 
the  mean  outcome  of  a  future  experiment,  by  using  the  linear¬ 
ized  ideas  of  credibility  theory.  This  almost  distribution-free 
approach  will  greatly  simplify  the  resulting  formulae.,  and 
will  provide  an  intuitively  appealing  bridge  between  classical 
and  Bayesian  regression  techniques.  And  we  shall  see  that 
in  many  cases  of  practical  interest,  the  linearized  credibility 
formulae  are  also  exact  Bayesian. 

First  we  review  the  basic  concepts  of  credibility  theory. 
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3 .  Credibility  Theory 

Credibility  theory  is  essentially  linear  least-squares 
applied  to  conditional  distributions.  Suppose  that  a  p-di- 
mensional  random  vector,  w,  is  to  be  forecast  from  a  single 
sample  of  an  r-dimensional  random  vector,  y  =  Yt  in  the  sense 
of  finding  a  p-dimensional  vector  forecast  function,  f(y), 
which  minimizes  the  sum  of  the  expected  squared  errors  for 
each  component 


H 


I 


i=l 


f^(y)]^  dP(w,y)  =  tr<#’{[w-  f  (y)  ]  [w  -  f  (y)  ]  '  }  .  (3.1) 


It  is  known  that  the  integrable  functions  f?  which  minimize 
(3.1)  at  value  form  the  conditional  mean  vector, 

f^(y)  =  <#{w|y}  .  (3.2) 


In  many  cases  the  exact  conditional  mean  is  difficult  to  cal¬ 
culate,  and  an  approximate  forecast  vector,  f,  is  acceptable. 
By  completing  the  square,  we  find 


H  =  + 


f  P  n  2 

I  [fV(y)  -  fi (y)]  dP(y) 

i=1 


=  tr/^'XlwIy} 


(3.3) 


so  that  any  f  can  0lso  be  evaluated  in  terms  of  its  fit  to  the 
conditional  mean  f  (y) . 

A  convenient  choice  of  an  approximate  forecast  vector  is 
a  linear  function  of  the  observables. 


r 

^i^y)  =  ^iO  .1^  ^ij^j  '  (i  =  1,  ...  p)  ,  (3.4) 

where  the  p(r+l)  coefficients  henceforth  called 

ored'Cbil'Cty  coeffio'ien'bs  /  are  adjusted  so  as  to  minimize  (3.1) 
or  (3.3).  It  is  well  known  that  the  optimal  values  of  these 
coefficients  are  then  given  by  rp  normal  equations  of  the  form 


r 

I 

j=l 


13 


^{yj;yk}  = 


(i  =  1,  .  . .  p) 
(k  =  1,  .  . .  r) 


/ 


(3.5) 
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with  the  determined  so  as  to  make  the  forecast  (3.4) 

unbiased  : 


z 


iO 


r 


I 

j=l 


^{f^Cy)}  =  .#{w^} 

(i  =  1,  .  .  .  p) 


(3.6) 


Let  Zq  be  the  p-vector  [Zj^q]  '  /  and  Z  the  pxr  matrix 
^z^j|j7^o[];  then  the  optimal  conditions  (3.5)  (3.6)  can  be  written 
as 


Z'Xty}  =<^{w;y} 


(3.7) 


and 

Zq  =  ^{w}  -  Z^{y}  ,  (3.8) 

so  that  the  optimal  linear  forecast  (3.4)  is 

f(y)  =  ^{w}  +  Z[y  -  (^{y}]  ,  (3.9) 

and  all  attention  can  be  focussed  on  finding  the  credibility 
matrix,  Z,  from  (3.7).  The  minimal  value  of  H  is  then  easily 
shown  to  be 


H  =  tr[T'{w}  -  Z<^{y;w}3^ 


(3.10) 


Notice  that  each  component  in  (3.1)  is,  in  fact,  minimized 
independently;  we  use  matrix  notation  only  for  convenience. 

In  Bayesian  problems,  the  joint  distribution  of  w  and  y 
is  parametrized  by  a  parameter  0  which  is  not  known.  There¬ 
fore  the  optimal  Z  must  be  determined  a  priori,  using  measure 
P(w,y)  =^P(w,y|6).  Thus,  the  covariances  in  (3.7)  will,  in 
general,  consist  of  two  terms  similar  to  (2.7).  One  also 
looks  for  special  forms  of  9^{y}  which  will  simplify  the  com¬ 
putation  of  Z  in  (3.7)  [16]. 


In  the  insurance  models  which  gave  rise  to  credibility 
theory,  there  is  an  underlying  sequence  of  p-dimensional 
random  vectors 


{x^,X2 , 


Xt  '  which  are  independent 


and  identically  distributed,  given  a  fixed,  but  unknown. 


#  •  # 


"risk  parameter e  .  The  problem  is  to  predict  | ,X2 f 

called  the  "experience-rated  fair  premium" .  Using  the  above 
analysis,  it  is  easy  to  show  that  the  optimal  linearized  ap¬ 
proximation  to  the  conditional  mean  is 


Xt), 


t+1 


Xt) 


f  (Xj^  ,X2 , 


■  Xt) 


(I  -  Z  )<#{x} 

p  X 


(3.11) 


where  I  is  the  pxp  unit  matrix,  and  Z  is  the  pxp  optimal  credi- 

p  X 

bility  matrix,  given  by 


Z  (E  +  tD  )  =  to 

X  X  X  X 


(3-12) 


where  E  and  D 

X  X 

of  a  typical  x, 
[13]  . 


are  the  pxp  matrix  components  of  the  covariance 
defined  in  a  manner  similar  to  (2.4)  and  (2.6) 


The  original  credibility  formula  was  developed  heuristically 
by  American  actuaries  in  the  '20s  for  a  one-dimensional  version 
of  (3.11),  in  which  Z  gives  the  weight,  or  "credibility,"  to  be 
attached  to  the  "experience"  sample  mean,  (Zx^/t) ,  as  opposed 

to  the  "manual  fair  premium"  ^{x}.  In  the  one-dimensional  case, 

O  <  Z  .  <  1,  and  approaches  unity  as  the  "weight  of  evidence", 

t,  becomes  large.  In  the  general  (but  nondegenerate)  model, 

Z  consists  of  p^  rational  functions  of  t,  not  restricted  to 

X 

[0,1];  however,  Z  ^I  as  t^<»,  showing  that  ultimately  the 

X  p 

sample  mean  of  the  1th  component  is  "fully  credible"  for  pre¬ 
dicting  the  ith  component  of  the  next  observation. 

Although  credibility  theory  was  originally  developed  as  an 
approximation  theory  for  mean  forecasts,  it  can  also  be  used 
as  an  approximation  theory  for  higher  moments,  or  even  for 
distributions  [4,5,11]. 

Moreover,  and  perhaps  more  importantly,  it  also  turns  out 
to  be  an  exact  theory  for  forecasting  the  mean,  when  the  likeli¬ 
hood  is  a  member  of  the  exponential  family  in  which  the  sample 
mean  is  a  sufficient  statistic,  and  when  a  natural  conjugate 
prior  is  chosen.  For  further  details,  see  [12,13,14]. 
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4.  Credibility  Applied  To  Regression 

We  now  apply  the  above  theory  to  three  related  Bayesian 
estimation  problems,  assuming  that  data  from  an  (nj^,Xj^,Yj^) 

experiment  is  available: 

(1)  the  estimation  of  the  mean  regression  parameters 
posterior  to  the  data; 

(2)  the  prediction  of  the  mean  response  in  a  future 
experiment  (n2/X2,y2); 

(3)  the  estimation  of  the  mean  error  variables  in  (1.1). 

We  shall  show,  with  minor  exceptions,  that  the  three  credibi¬ 
lity  estimates  are  equivalent,  and  related  to  the  classical 
estimator  (1.3). 

4 , 1  Estimation  of  Regression  Parameters 

Suppose  we  wish  to  estimate  #{3(§)  with  credibility 

theory  (X^^  is  still  fixed  and  known)  .  Then  in  Section  3  we 
take  w  =  3(0),  k  =  r,  and  y  =  y^^,  giving  #{w}  =  b,  S’iy]  =  X^^b, 

^{w;y}  =  ^{3(9) 0}}  =  AXj^  , 

and,  from  (2.7) , 

<r{y)  =  +  x^Axj  , 

where  E^^  =  is  the  n^  x  n^  matrix  of  expected  covari¬ 

ances  of  during  the  experiment. 

From  (3.7),  the  k  x  n^^  credibility  matrix 

Z3  =  AXj^C^]^  =  ^X|(Ej^^  +  X^Axp"^  (4.1) 

gives  a  linear,  unbiased  estimate  of  the  posterior  parameter 
vector 

^13(0)  ly^/X^}  -  fg(y3^,X3^)  =  ^3^1^’^'^  ^3^1  * 


Notice  that  no  assumptions  have  been  made  about  the  distribu¬ 
tions  p(yl0)  and  p(0)  (except  for  the  existence  of  the 
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indicated  moments) ,  nor  about  the  independence  of  the  compo¬ 
nents  of  given  0.  However,  must  exist  for  the  inverse 

in  (4.1)  to  be  well  defined,  if  no  special  assumptions  are 
made  about  (see  Section  4.3). 

4.2  Prediction  of  Mean  Response  in  Future  Experiments 

Now  suppose  we  have  in  mind  a  well-defined  future  experi¬ 
ment  (n2fX2,y2)f  and  the  problem  is  to  estimate  S{Y2\y-\)  = 

<^{y2 1 yi,Xi,X2}  by  credibility  theory.  There  are  two  possible 

cases,  depending  on  whether 


121(0)  '  ^21  "  ^^^21^®^^  ' 

are  zero  or  not,  i.e.,  whether  knowledge  of  the  parameter 
decouples  the  results  of  past  and  future  experiments  or  not. 

4.2.1  No  Covariance  Between  Experiments 

In  most  classical  regression  models,  there  is  no  covari¬ 
ance  between  past  and  future  observations,  given  0,  either  by 
assumption,  or  because  there  is  a  sufficient  interval  between 
the  two  experiments,  even  if,  say,  the  error  process  has  serial 
correlation. 

For  an  exact  Bayesian  analysis,  we  have  from  (2.1)  and 
(2.9): 


^{y2|yi/Xi,X2}  =  X2#{6(e) IYi^Xi)  ,  (4.3) 

which  shows  the  close  relation  between  the  two  problems. 

Similarly,  because  of  the  linearity  of  a  credibility 
forecast,  it  follows  that 


=  (Xj-Z  Xj^)b+Zy  y^  ,  (4.4) 


where  Z  is  the  n„  x  n,  credibility  matrix 
Yo  z  ± 


-1 


^2^6 


=  X2AXI  (Ej^j^  +  Xj^AXp 


(4.5) 
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In  other  words,  when  there  is  no  covariance  between  experiments, 
estimation  of  the  regression  coefficients  by  credibility  is 
equivalent  to  estimation  of  future  response. 

4.2.2  Covariance  Between  Experiments 

In  the  general  case  in  which  E2j|^{9)  ^  0,  infrequently 

considered  in  the  literature,  the  complete  Bayesian  analysis 
is  more  complicated,  and  one  needs  to  replace  the  assumption 

S’{y2\^2’^^  ~  X2B(9)  by  an  equivalent  assumption  about 

^{y2 1 yi , Xi ,X2 , 9} .  This  could  be  of  arbitrary  form,  but  if  it 

is  to  be  in  agreement  with  the  classical  multinormal  results, 
then  we  must  choose  the  usual  regression  of  Y2  (see,  e.g. 

[23])  : 

<^{y2lyi^X^,X2,9}  =  X2B(9)  +  (9)  [y^  -  X^3  (9)  ]  .  (4.6) 

In  an  exact  updating  through  (2.8),  difficulty  would  arise 

from  the  possible  covariance  of  the  terms  Z2]_(Q)  ^2_j^(9) 

with  each  other,  and  with  3(9).  However,  if  these  terms  have 
small  covariances  compared  with  those  of  3(9),  then  one  could 
with  small  error  replace  these  terms  by  their  expected  values, 
and  use  the  approximation 


^{y2  l^l'^l'^2' ^  ^  ^21^11^^!  “  ^1^(0)  ] 


(4.7) 


to  give  an  exact  Bayesian  updating  : 


^"{y2lyi'^l'^2^  ^2^^^^®^  1^1^  ^21^11^^!  "  1^1^^  * 

In  the  credibility  approximation,  the  formula  in  Section 
4.2.1  is  replaced  by 


^{w;y}  =  X2AXI  +  E23_ 


(4.9) 


so  that  the  new  credibility  matrix  is 


Z 


^2 


(X2AXI  +  E2j^)  (E^^  +  X^AX|) 


9 


(4.10) 
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and,  after  some  algebra,  we  find 


^■Cy2  1  **  ”  ^2^3  ^^1 '^1^  '*' ^21^11  ^^1  ^1^3  ^  ' 

(4.11) 

which  is  of  the  same  form  as  (4.8).  So,  to  the  degree  to  which 
(4.7)  may  replace  (4.6),  we  again  have  a  simple  relation  be¬ 
tween  credibility  estimates  for  the  parameters  and  forecasts 
for  future  observations. 

4 . 3  Relationship  to  Classical  Regression  Estimation 

In  classical  regression,  emphasis  is  placed  upon  having 
sufficient  observations  to  fully  identify  all  of  the  regression 
parameters,  i.e.,  n^  ^  k,  and  has  full  rank  k;  the  neces¬ 
sity  for  this  can  be  seen  from  the  classical  estimator  (1.3). 

On  the  other  hand,  in  the  Bayesian  credibility  model, 
it  can  be  seen  from  (4.1) -(4. 2)  that  the  finiteness  of  b, 

E^]^,  and  A  is  sufficient  to  guarantee  the  existence  of  an 

estimator  for  3;  one  sample  will  revise  the  prior  estimate  of 
b,  even  if  X,  does  not  have  full  rank  I  In  fact,  if  n,  is  small, 

-L  _1  ^ 

the  calculation  of  +  X^AXj^)  is  particularly  simple. 

However,  to  relate  our  results  to  classical  theory,  we 
shall  henceforth  assume  that  n^^  ^  k,  and  rank(Xj^)  =  k,  and 

use  the  following  result  which  Bodewig  ([2]  pp.  39,  218)  at¬ 
tributes  to  H.  Hemes,  and  which  is  also  given  by  Tocher  [29] 

(see  also  Bindley  and  Smith  [19] ,  pp.  6  and  34  for  two  later 
attributes) . 

Theorem.  If  a  and  3  are  nxk  matrices,  then 

(I^  +  a3')“^  "  ^n  "  “^^k  '  (4.12) 

whenever  either  of  the  indicated  inverses  exists. 

The  fact  that  the  determinants  of  the  two  terms  in  paren¬ 
thesis  are  identical  shows  that  the  existence  of  one  inverse 
implies  the  existence  of  the  other. 

If  we  apply  this  to  with  a  =  X^  and  3'  =  AX^Ej^^^,  we 

get 


C 


-1 

11 


=  (E^j^  +  X^AXj^) 


(4.13) 
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Defining  the  two  k  X  k  matrices 

!  ‘“•I'” 

z,  =  (I|^+ =  A(A+ej^)"^  =  (e"^+ (4.15) 
we  obtain  finally 

Zg  =  z^e^X'E'^  ,  ZgX^  =  z^  ;  (4.16) 

and  (4.2)  and  (4.5)  become 

fg(yi,Xi)  =  (Ij^  -  z^)b+ z^3^(y^)  ;  (4.17) 

fy^(yi,Xi,X2)  =  X2[(Ij^-  z^)b+ z^g^(y^)]  ;  (4.18) 

with  a  k-dimensional  vector  estimator  for  0  of 

=  <n®u='i>'"='Puyi  • 

This  rearrangement  requires  rank(e^^)  =  k. 

(4.17)  is,  from  an  aesthetic  viewpoint,  extremely  satis¬ 
fying,  for  it  shows  the  familiar  credibility  mixing  between 
the  prior  mean  parameter  vector,  b,  and  a  sample  statistic, 

e(y^),  in  a  manner  similar  to  the  multidimensional  credibility 

formula  (3.11),  and  extensions  of  it  to  other  sample  statistics 
[12]  [13] .  Only  a  small  credibility  matrix,  z^,  need  be  cal¬ 
culated  from  (4.15),  and  its  size  depends  only  on  the  number 
of  parameters  to  be  estimated,  not  the  number  of  data  points. 

Of  course,  one  must  calculate  but  this  is  needed  in  any 

regression  problem,  and  is  often  assumed  to  be  of  diagonal 
form.  There  is  an  obvious  parallel  between  (4.15)  and  (3.12). 

There  remains  to  explain  the  relation  between  the  ^ 
estimator  (y^^)  in  (4.19),  and  the  classical  estimator  P]_(y]_) 

in  (1.3),  for,  as  we  know,  the  latter  should  be  used  with  the 
total  covariance  +  X^AXj^.  However,  a  simple  cal¬ 

culation  will  show  that  the  second  term  is  annihilated  in  the 
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least-squares  form,  so  that 


'  (4.20) 

and  it  is  a  matter  of  indifference  how  the  estimator  is 
calculated. 

4 . 4  Estimation  of  Error  Variables 

After  a  regression  model  has  been  calibrated,  it  is  often 
useful  to  verify  the  assumptions  of  the  model  by  examining  the 
residual  vector,  y^^  -  X^^f ^  (yj^,X^)  . 

One  can  also  think  of  estimating  the  true  value  of  the 
error  variables,  u. ,  in  (1.1)  by  using  Bayesian  analysis  [33] . 
Using  the  credibility  approach,  we  first  find  ^{u^}  =  0, 

<X'{uj^}  =«^{u^;yj^}  =  and  then  find  the  mean  estimate  , 

=  yj^-Xj^fg(y^,X^)  ,  (4.21) 

which  is  exactly  the  vector  of  residuals  I  This  might  have 
been  expected  from  first  principles. 

Perhaps  it  is  worth  pointing  out  that  [6,  Appendix  3] 

'r{Ui;fu^(yi,Xi)  }  =  0  . 


(4.22) 


5.  Estimation  Error  Covariances--Limiting  Cases 


It  is  of  interest  to  compute  the  improvement  in  estimation 
to  be  expected  from  the  credibility  formulae. 

For  the  regression  parameters,  let  the  estimation  error 
covariance  matrix  be 


$p{Xi)  =#{[6(6)  -  f  g  (y^/X^)  ]  [6  (6)  -  fg  (yj_,X^)  ]  '  } 


=  r{6(e)  - 


(5.1) 


because  the  estimator  is  unbiased,  a  priori. 

By  elementary  calculations  based  on  Sections  3.1  and  4, 
we  find  that  the  minimal  "preposterior"  value  is  the  analog 
of  the  term  in  square  brackets  in  (3.10): 

Remember  that  only  the  diagonal  terms  of  <l>  are  (independently) 
minimized  in  using  (3.1),  H  =  tr$. 

For  the  prediction  of  mean  future  response,  we  find  in 
the  no-covariance  case  of  Section  4.2.1: 


4.  (Xi,X2)  -r{y2  - 

=  E22  +  X2(I„-Zi)AX'  =  E22  +  X2?y£:yX^  •  (5 


The  result  with  covariance  between  experiments  is  similar, 
with  additional  terms  involving  E22_- 

The  preposterior  estimate  of  the  covariance  matrix  of  the 
residual  vector  (4.21)  is 


C5.4) 


Without  an  initial  experiment,  the  value  of  would  be 

zero,  and  from  (4 . 17)  (4 . 18)  (4 . 21)  we  would  haye^°  ,, 

means,  b,  X2b  and  y^,  as  predictors,  and  (5.2)  (5.3)  (5.4)  would 

be  equal  to  the  appropriate  total  prior  covariance  matrices. 
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A,  E22  +  respectively. 

Similarly,  if  the  first  experiment  is  performed  under 
poor  observational  conditions,  then  the  diagonal  elements  of 
be  much  larger  than  those  of  X^^AX-J^.  We  see  directly 

that  would  be  zero,  and  there  would  be  a  vote  of  "no  con¬ 
fidence"  in  the  estimator  §^(y^),  and  b,  X2b,  and  y^^  would 
again  be  the  minimum-variance  predictors  for  6(9),  ¥2'  ^1' 

respectively . 

However,  conversely,  if  the  diagonal  elements  of  A  are 
very  large  compared  to  those  of  this  means  that  our  prior 

knowledge  is  very  imprecise  compared  to  the  error  conditions 

of  the  experiment;  A  ^ 0  is  the  credibility  equivalent  of  the 
"diffuse  prior"  assumptions  often  made  in  Bayesian  analysis. 

In  this  case,  we  see  that  1;  "full  credibility"  is  attached 

to  the  classical  estimator  B2^(y-|^)/  and  the  prior  mean,  b,  is 

given  zero  weight.  There  remain  only  the  irreducible  error 
covariances  in  estimating  3(6)/  ^22  predicting 

y^,  and  X^e^X|  in  estimating  u^. 

Also,  if  we  consider  experiments  with  increasing  n^^,  then, 
under  certain  natural  conditions,  such  as: 

(1)  The  elements  of  are  bounded,  for  all  n^^; 

(2)  The  design  matrix,  X^^,  "fills  out"  a  finite  range 
of  the  x-axis  in  a  stable  manner,  as  n^^  increases; 

it  is  easy  to  show  that  the  elements  of  in  (4.14)  are 

bounded  by  a  function  which  diminishes  as  n^^  ,  that  is,  z^ 

approaches  as  n^^  increases  (see,  e.g.,  [18]).  In  practical 

terms,  this  means  that  an  increasing  number  of  initial  sample 
points  can  reduce  the  preposterior  covariance  in  estimating 
the  regression  parameter  (5.2)  as  close  to  zero  as  desired; 
however,  there  will  always  be  an  irreducible  covariance  E22 

in  making  forecasts  (5.3).  The  covariance  matrix  ^^(Xj^)  in 

(5.4)  continues  to  grow  in  dimension,  and  depends  in  a  com¬ 
plicated  manner  upon  the  actual  structure  of  X^^. 
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6 .  Random  Design  Matrices 


In  many  applications,  and/or  X2  must  be  considered  as 

random,  either  as  a  result  of  an  uncontrollable  input,  because 
the  effective  input  cannot  be  precisely  observed,  or  because 
of  deliberate  randomization.  There  are  many  special  cases  in 
the  literature,  (see,  e .g .  ,  [7 , 32 ] ) ;  we  shall  derive  general 
credibility  results,  and  indicate  only  a  few  of  the  possible 
specializations.  Special  attention  must  be  paid  to  whether 

,  X2 ,  or  both  are  random  variables,  so  throughout  this  section 

we  shall  indicate  the  status  of  all  inputs  and  outputs  explic¬ 
itly.  We  start  with  two  simpler  cases. 

6 . 1  X2  Random  and  Independent  of  Fixed  Initial  Experiment 

If  the  future  design  matrix  X2  is  random,  but  independent 

of  the  fixed  initial  experiment  (n2^,X^,yj^)  ,  then  the  problem 

of  estimating  the  regression  parameters  is  unchanged  from 
Section  4.1. 

However,  to  predict  the  mean  response  of  the  second  ex¬ 
periment,  we  must  now  calculate  a  credibility  approximation  to 

#{Y2  }  =  (f(f{y2  ly3^/Xj_,X2}  .  Assuming,  for  simplicity, 

unobservationally  unrelated  experiments,  12}^  have 

from  (2.1)  and  Section  4.2.1., 


^{^2}  =  (f  [^1X2  I  0}  •  $  (0)  ]  /  (6.1) 

and 

^^^2'^!^^!^  =  ^{<f{X2  I  0}  •  3  (6)  ;Xi3  (6)  }  •  (6.2) 


Since  r'iyj^lx^}  =  +  X^AX|  and  ^{y^^lx^}  =  X^b  still,  the 

only  effect  in  this  case  has  been  to  modify  the  first  term. 


X2AXI, 


in  the  definition  of  Z  in  (4.5)  to  the  form  in  (6.2) 

^2 


and  to  change  the  z^  term  in  (4.4). 

An  important  special  case  is: 

Assumption  I.  Any  random  X  is  statistically 

independent  of  0 . 


(6.3) 


In  this  case,  we  see  directly  that  ^{^2^  ~  ^{x2^^ 
^{y2'yil^l^  =  results  of  Section 


-18- 


4.2.1  apply  with  X2  replaced  by  its  expected  value! 

6 . 2  Estimation  of  Regression  Parameters  when  is  Random 

If  is  random,  then  to  estimate  3(0)  we  must  use  the  joint 
density  p(y3^/X^le)  and  generalize  (4.2).  For  the  mean  outcome 
of  the  initial  experiment, 

^{y^.}  =-- ^{^{X3_le}-3(e) }  /  ^6.4) 

but  the  covariance  of  y^  now  has  three  terms: 

=  #{E^^(X3_,e) }  +  r{(f{X^|  6}- 3(0)  }  +^{X^3(e)le}  r  (6.5) 

where 

E^^(X3^,0)  ='r-{y^lX3_,6}  (6.6) 

shows  explicitly  the  possible  dependence  of  the  conditional 
observational  covariance  both  on  the  design  X^  and  on  0 . 

(For  consistency,  we  shall  assume  in  the  next  section  that 
neither  (6.4)  nor  (6.6)  can,  however,  depend  upon  the  future 
values  (y2 ,X2) • ) 

Since  3(0)  is  constant,  given  6,  there  is  still  only  one 
term  in 


'^{3(0) /-y^}  =  ^(3(0)  ;<^{x^l0}*3(e) }  .  (6.7) 

This  form  and  the  first  two  terms  in  (6.5)  are  easily  seen  to 
be  the  generalizations  of  AXj^  and  +  X^AXj^,  respectively, 

as  used  in  Section  4.1. 

However,  the  last  term  in  (6.5)  is  new,  call  it  U.  It 
has  components 

(t,u=l,2,...k)  f  (6.8) 


and  thus  contains  information  about  the  conditional  covariances 
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between  independent  variables. 

In  many  models,  such  as  " error s-in-the-var iables , "  or 
"target  inputs"  [7] ,  successive  inputs  are  independent,  or 
have  independent  errors  around  fixed  means,  expressable  as: 

Assumption  II .  Rows  of  any  random  X  are 

statistically  independent.  ' 


In  this  case,  it  follows  that  U  is  diagonal.  Additionally,  we 
point  out  that  in  many  regression  designs,  the  first  column 
of  X,  is  non-random  (consisting  entirely  of  l^s),  so  that  the 
summations  in  (6.8)  would  begin  with  i  =  2  and  j  =  2. 

If  Assumption  I  is  taken  also  to  apply  to  , 

; 

r-{y^}  =  }  +  <^{Xj^}A^{Xp  +  U  ;  (6.10) 

9?{e(0);yi}  =  A^{xp  ; 

and  the  main  effect  on  the  credibility  estimate  (4.1),  apart 
from  replacing  X^^  by  its  mean  value,  and  defining  a  more  gen¬ 
eral  average  covariance  add  a  diagonal  matrix  U  to 

the  covariance  of  y^ ,  with  terms 


U 


tt 


k 

=  I 

i=l 


k 

I 

j=l 


(Aij+bibj)<^{xti'^tj^ 


(t  =  1,2,  ...  k)  .  (6.11) 


This  will  change  in  an  obvious  manner,  and  we  see  that 
the  estimator  to  be  used  in  (4.17)  becomes 


3(yi)  =  +  u)"^^{Xj^}J"^,r{xp 


(6.12) 


with  the  new  interpretation  of  Ej^j^  from  (6.10),  and  a  new 
=  ^{Xp  (E^J^  +  U)"V{Xj^} 


(6.13) 
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used  to  define  in  (4.15). 

6 . 3  General  Case 

In  the  general  case  when  all  inputs  and  outputs  are  random, 
we  must  work  with  the  joint  density  p(y2^/X^,y2/X2 1  9)  r  and  be 

extremely  careful  about  the  assumptions  of  dependence  and  in¬ 
dependence  which  are  appropriate  to  the  model  under  consider¬ 
ation.  Different  models  may  lead  to  different  conditional 
decompositions  of  this  joint  density. 

Usually  the  regression  parameters  are  estimated  after  the 
initial  experiment,  so  that  the  results  of  Section  6.2  apply. 

If  both  experiments  are  performed,  then  the  total  data  may  be 
pooled,  and  the  same  results  apply  with  obvious  modification 
(see  Section  7) . 

Therefore  the  central  problem  of  interest  in  credibility 
theory  will  be  to  predict  ^{y2\Yi^  >  which  we  need: 

SiY-^y  r  <^iY2^  '  and  ^{Y2'tYi^  •  (6.4)  and  (6.5)  still  apply 

because  the  data-gathering  experiment  is  prior  to  the  one  for 
which  the  prediction  is  made.  However,  to  compute  S’{y2^  ' 

need  an  assumption  such  as  (4.7)  to  specify  a  form  for 

^{y2lyi,Xi,X2f 9}.  Given  this,  we  then  uncondition  in  any 

convenient  way ,  say 


^{^2}  1  X2lyi  1X3^19}  ,  (6.14) 

using  any  other  simplifications,  such  as  Assumption  I,  which 
apply.  Further  reduction  will  need  a  careful  analysis  of  the 
experimental  conditions;  for  example 

Assumptions  III (a)  (b)  or  (c) .  The  choice  of  the 

future  design,  X2 /  given  9,  depends  only  on  (6.15) 

(a)  the  past  input,  Xj^;  or  (b)  the  past  output,  yj^ ; 
or  (c)  on  both  (Xj^,yj^); 


III  (a)  might  obtain  if  (Xj^,X2)  were  part  of  the  same  pre¬ 
determined  experimental  design,  or  if  errors  in  the  indepen¬ 
dent  variables  were  serially  correlated;  III (b)  might  be 
correct  if  the  future  input  values  depended^upon  the  previous 
outputs,  or  perhaps  on  some  estimator  of  6(9) ,  such  as  (4.2) , 
as  generalized  in  Section  6.2. 
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For  the  RHS  of  (3.7),  repeated  application  of  the  prin¬ 
ciple  of  conditional  covariance  leads  to 

+  #^{^^{y2lX2jX^/6};X]^3(0)  I  (6.16) 

+  ^W<^{y2lx2|Xi|e};<^^x^l  0>*B(0) }  / 

where  the  argiiments  of  (X2  ,X^ ,  9)  show  that  the  covariance 
of  observational  errors  between  72  and  can  now  depend  upon 
both  inputs;  one  possible  term  in  (6.16)  is  missing  because 
we  still  assume  (f{y^ 1X2, X^,0}  =  X^3(e).  Further  simplification 
depends  upon  using  forms  such  as  (4.7),  and  clarifying  the  ex¬ 
perimental  relationships  between  6 ,  X^^ ,  and  X2  • 
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7 .  Prior  Information  and  Prior  Experiments 

The  distinction  between  prior  information,  in  the  usual 
Bayesian  sense,  and  the  information  obtained  as  the  result  of 
a  prior  experiment  is  not  clear-cut.  Suppose  we  have  given 
prior  information  {b,A)  about  3(6),  and  the  matrix  of  observa¬ 
tion  error  covariances  E  for  any  (n,X) .  A  first  experiment 
^^I'^l'^l^  then  provides  a  further  estimate  of  3(6),  which 

supplements  our  knowledge  prior  to  the  performance  of  a  second 
experiment  (n2fX2,y2);  thus,  there  is  total  prior  information 

(b, A;E^^;n^,X^,y^)  as  input  to  the  second  stage.  On  the  other 

hand,  we  know  that  the  estimation  of  3(6)  after  two  experiments 
can  be  regarded  as  a  combined  single  experiment,  and  it  is 
interesting  to  examine  further  the  relationship  between  these 
two  viewpoints . 

To  estimate  ^{3  (6)  |y^,X^;y2,X2} ,  we  form  the  enlarged 
versions  of  (2.1) (2.2)  : 


_ 

r 

^1 

72_ 

0 

^1 

X2_ 

?! 

.^2. 


6  ^ 


S^^(6)  0 

0  E22(e) 


(7.1) 


(7.2) 


where  we  have  assumed  the  two  experiments  are  observationally ^ 
independent,  and  the  design  matrices  are  fixed.  Then,  following 
the  analysis  of  Section  4.1,  we  find  an  enlarged  Z^-type 

kx  (n,  +  n^)  credibility  matrix,  Z,  for  the  combined  experi- 

12  -L  ,  * 

ment  / 


Z 


1,2 


Efi  +  X^AXj^ 


X2AX' 


X^AX 


2 


^22 


+  X2AX' 


(7.3) 


which  is  then  used  in  the  estimate  : 
^{3(8)  |y^,Xj^;y2,X2}  ==  f  3  (Yi ;y2 '^2^ 


'^1 

\ 

^1 

^1,2 

X2 

^2 

/ 

-  - 

(7.4) 
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If  we  define  individual  Z^-type  matrices  for  each  of  the 
experiments  individually, 

+  X^AXp"^  ,  (i  =  1,2)  ,  (7.5) 

then  the  combined  credibility  matrix  can  be  written  in  a 
simpler  form : 


'1,2 


'  [^1  ^2] 
=  [^1 


I 

^1 

X1Z2 

X2Z1 

I 

^2 

X1Z2X 

c 

H 

1 

-  X2Z^: 

-1 


-1 


(7.6) 


-1, 


-1, 


-(I  -X,  ZoX.,Z,)  X,  z_ 

'  n^  1221'  12 


-1 


Further  simplification  requires  the  assumption  of  full  rank  for 
and  X2,  and  the  definitions  (see  (4.14)  (4.15) : 

=  X|eT];X^  ;  =  A(A+e^)"^  ;  (i  =  1,2).  (7.7) 

After  repeated  use  of  (4.12)  and  (4.16),  the  result  finally 
simplifies  to 

^1,2"  ^2^  ^1^2^'^^1  ^^k“  ^1^  ^^k"  ^2^1^~^^2l* 


Defining  the  individual  classical  estimators  for  each  exper 
iment 


-1 


3,-  (y,- )  =  e.X!E.ty. 
1  ■'i  1  1  ii-^i 


(i  =  1,2) 


(7.9) 


we  obtain  finally  the  combined-experiment  estimate  , 

fg(yi,Xi;y2,X2)  =  (Ij^  -  z  -  z  ^^^  )  b  +  z  §3^  (y^^)  +  z  §2  ' 


(7.10) 
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where 

(7.11) 

This  formula  can  then  be  rearranged  so  as  to  display  a  new 

prior  mean,  ,  which  is  used  as  input  to  the  second  experi- 

(2) 

ment,  together  with  the  credibility  matrix  z '■  in  the  "single- 
stage"  formula 

fg(yi,Xi!y2,X2)  =  +  z'='>B2(y2)  •  <7-12) 

Then,  we  find  that 

b(2)  =  b  t  (I,,-z(2))-l,(l)[g^(y^)-b] 

=  '  (7.13) 

is  just  the  usual  first-stage  credibility  prediction  (4.2)  or 
(4.17),  which  becomes  the  mean  input  for  the  second  experiment. 

We  may  further  clarify  (7.12)  by  seeing  what  equivalent 

(2) 

regression  coefficient  covariance,  say  A  ,  is  used  as  input 
to  the  second  experiment  to  find  the  credibility  coefficient 
in  the  usual  way  as 

z^2)  =  ^(2)  (^(2)  ^  ^^)-l  ,  (7.14) 

We  find 

i<2)  ^  (e-2  +  A-'-)-'-  =  =  tg(Xj^)  ,  (7.15) 

which  is  just  the  preposterior  estimate  of  the  error  covari¬ 
ance  (5.2)  after  the  first  experiment  I 

To  summarize,  we  can  view  the  two  experiments  (n2^,Xj^,yj^) 

in2,y^2’^2^  * 
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(1)  Either  as  a  combined  experiment  in  which  the  prior 
information  b  and  A  is  used  in  (7.10)  to  form  an 
estimate  of  3(6); 

(2)  Or  as  a  two-stage  process  in  which  b  and  A  are  used 
in  the  first  experiment  to  form 

and  these  values  are  then  used  as  the  prior  vector 
mean  and  matrix  covariance  of  the  regression  coeffi¬ 
cients  for  the  independent  second  experiment, 
forming  an  estimate  of  3(9)  using  (7 . 12)  (7 . 14) . 

The  extension  to  multiple  cascaded  experiments  is  obvious. 
Also,  it  follows  that,  prior  to  both  experiments,  our  estimate 
of  the  final  covariance  matrix  is 


$3(Xi,X2) 


=  (e 


-1  .  .-1,„  =  (A"^  +  +  £2^)"^ 


(Xj^)  ) 


In  other  words,  the  total  final  precision  is  estimated,  prior 
to  any  experiment,  to  be  the  sum  of  the  prior  precision  plus, 
the  observation  precision  of  each  experiment. 

We  now  examine  several  special  cases  of  interest. 

7 . 1  Imprecise  Experimental  Results 

If  the  first  experiment  is  performed  under  poor  observa¬ 
tional  conditions,  we  expect  the  diagonal  elements  of  E^^^^  to 

be  large  compared  to  those  of  X^^AXj^.  Under  these  conditions, 

z^-^0,  results  of  the  first  experiment  are 

ignored,  with  b  and  A  used  directly  as  inputs  to  the  second 
stage.  Similar  remarks  apply  to  imprecise  results  in  the 
second  experiment;  and,  of  course,  if  both  experiments  have 
high  observational  variances,  then  the  best  forecast  is  just  b 

7 . 2  Diffuse  Prior  Information 

If,  on  the  other  hand,  the  prior  variances  of  the  re¬ 
gression  coefficients  are  very  large  compared  to  the  imputed 
covariances  and  £2  due  to  observational  error,  then  z^  and 

Z2  approach  unity,  and  we  see  from  (7.14)  (7.15),  or  by  care¬ 
ful  limits  in  (7.11),  that  (c"^  +  £2^)  ,  (i  =  1,2), 

and 


fj(y 


l,Xi 


(e 


-1 


-1,-1 

^2  ^ 


-1^ 

^1  ^I'^l^ 


(Yi) 


.  (7.16) 
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In  other  words,  the  prior  information  is  ignored  as  the  diag¬ 
onal  elements  of  A  become  large  (the  prior  becomes  "diffuse") , 
and  the  resulting  estimate  weights  the  classical  estimators 
from  each  experiment  in  the  familiar  proportional-to-precision 
manner.  A  formula  similar  to  (7.16)  is  given  by  sampling 
theory  arguments  in  the  "mixed-estimation"  method  of  Goldberger 
and  Theil  [8,  Section  5-6]  [9]  [27]  [28]  . 


Alternatively,  we  may  regard  this  case  as  one  in  which 
a  prior  mean  a  prior  covariance  e-j  are  used  as 

input  to  the  second  experiment. 


7 . 3  Direct  Estimate  of  Regression  Parameters 


If  the  first  experiment  provides  a  direct  measurement  of 
the  regression  parameters,  3(6),  then  n^^  =  k,  =  Ij^,  and 

for  consistency,  we  could  call  y^  =  bj^  a  new  estimate  of  b, 
with  covariance  of  observation  errors,  say.  Then, 

the  credibility  matrix  in  this  special  first  experiment  is 
=  A  ( A  +  Aji^)  ,  the  mean  input  (7.13)  to  the  second  ex¬ 

periment  is 


b<2)  =  (A-l 


Ai  ) 


A  ^b  + 


A-^b^ 


(7.17) 


and  the  covariance  matrix  input  (7.15)  is 


A 


(2)  ^ 


A-l^ -1 

Ai  ) 


(7.18) 


In  other  words,  if  there  are  two  prior  estimates  of  the  re¬ 
gression  parameters,  then  they  should  be  combined  in  the  usual 
proportional-to-precision  manner,  and  then  used  as  input. 

7 . 4  Similar  Experiments 

If  the  design  matrix,  X,  of  the  two  experiments  is  the 
same,  then  the  common  z  =  A(A+e)  with  e  =  X'E  X,  and 
the  forecast  (7.10)  can  be  written 


ffl  Cy- 


;y2;X)  = 


e (2A  +  e)  ^b  +  2A(2A  +  e) 


% (3 (y^) 


3 (y2) ) 


(7.19) 


with  an  obvious  definition  of  the  common  function  3(y).  In 
this  foirm,  the  analogy  with  the  many-sample  credibility  fore¬ 
cast  (3.11) (3.12)  is  obvious,  and  the  extension  to  t  similar 
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experiments  is  immediate  : 


fg(y3^;Y2' — “  [Ij^-z(t)]b  +  z(t) 


with  a  new  credibility  matrix 


I  I  g(yi) 

_  i=l 


z (t)  =  tA (tA  +  e) 


-1 


(7.20) 


(7.21) 


7 . 5  Repeated  Dissimilar  Experiments 


For  completeness,  we  give  the  general  formulae  correspon¬ 
ding  to  (7 . 10)  (7 . 11) ,  when  t  dissimitav  experiments 

^^I'^l'^l^ ^^2'^2'^2^  ...  (n^,X^,y^)  are  performed.  In  an  obvious 

extension  of  notation  , 


fg  ( ;i  =  1/2, .  .  .  ,t) 


Ir-  I  b  +  I  z^^^6^(y^)  , 

i=l  J  i=l 


where  the  z^^^  are  the  solutions  of 


(7.22) 


[ 


(t)  ^ 


[1  1  • •  •  1] 


k 

-1 

2 


"k  "k 


-1 


-1 


(7.23) 


The  prior-to-experiments  estimate  of  the  final  covariance  of 
the  estimator  error  is 


. *  X 

that  is,  the  final  precision  is  estimated  to  be  the  sum  of  the 
prior  precision  plus  all  of  the  observational  precisions.  Of 
course,  as  indicated  earlier,  it  is  probably  easier  to  compute 
(7.22)  in  the  recursive  manner  suggested  earlier  in  this 
section. 
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8 .  Related  Work 

There  are  two  papers  which  originated  the  application  of 
credibility  theory  to  regression  problems.  In  a  multidimen¬ 
sional  model,  with  elaborate  notation  based  on  practical  con¬ 
siderations,  Hachemeister  [10]  has  given  prediction  formulae 
equivalent  to  (4 . 18) (4 . 19) ;  however,  his  derivation  appears 
to  require  the  assumption  of  heteroscedastic  error  terms,  i.e. 

1(0)  =  o^(e)Ij^  ,  (8.1) 


or  of  the  sample-mean  generalization  in  which  the  ith  diagonal 
term  of  1(6)  is  a^(0)/P^,  where  is  the  "volume"  of  the  ith 
sample. 

He  also  gives  a  credibility  result  for  a  homogeneous 
estimator,  i.e.,  with  =  0  in  (3.4),  and  the  remaining 

credibility  coefficients  constrained  to  give  an  unbiased 
estimator.  For  models  of  this  type,  one  usually  has  collateral 
data  [17]  from  similar  experiments  performed  on  other  risks, 
with  independent  values  of  0 . 

Taylor's  first  paper  [25]  concentrates  on  the  two— param¬ 
eter,  homogeneous  estimator  model,  using  essentially  the  same 
assumptions  as  Hachemeister,  but  with  a  simplified  unbiased¬ 
ness  constraint.  In  a  later  paper  [26]  ,  Taylor  generalizes  • 
both  the  homogeneous  and  inhomogeneous  versions  of  (4.18)  to 
Hilbert  spaces,  and  shows  various  special  cases. 

Turning  to  exact  Bayesian  regression  results  based  upon 
multinormal  likelihoods,  Raiffa  and  Schlaiffer  [22]  give 
formulae  equivalent  to  (4.17)  for  the  cases  in  which 

(1)  a^(0)  =  is  a  known  constant,  and  the  prior  on  8(0)  is 

multinormal  (b,A);  (2)  (a^(0),8(0))  are  inverse-Gamma-multi- 

normally  distributed.  Other  models  by  Tiao,  Zellner,  and 
Chetty  [29] [30] [32] [34]  concentrate  on  the  use  of  a  diffuse 

prior  density,  p(8,a^)  a  ,  or  its  multidimensional  equivalent 
[32,  Chapter  8];  thus,  after  one  experiment,  8 (y^^)  is  "fully 

credible,"  or  after  two  experiments,  results  similar  to  (7.16) 
are  obtained.  Of  course,  since  these  are  exact  Bayesian 
results,  the  complete  posterior  distributions  of  the  parameter 
are  available — usually  some  variation  of  the  multivariate-t 
density. 

In  [32,  p.  240],  Zellner  takes  an  "informative"  prior 
which  is  slightly  more  general  than  the  usual  natural— conjugate 
prior  for  the  multinormal;  his  likelihood  is  multivariate, 
with  homeoscedastic  errors,  which  can  be  reinterpreted  as 
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single-variate  with  arbitrary  T.{Q).  By  expanding  the  resulting 
posterior  density  for  the  regression  parameters,  he  finds  from 
the  leading  normal  term  a  mean  estimate  which  is  "a  'matrix 
weighted  average'  of  the  prior  mean... and  the  least-squares ^ 
quantity  0  whose  weights  are  the  inverse  of  the  prior  covari¬ 
ance  C  and  the  sample  covariance  matrix."  This  is,  of  course, 
just  our  result  (4.17)  (4.18)  (5.2)  ,  gotten  as  an  approximation 
for  arbitrary  likelihood  and  prior  densities. 

We  have  also  indicated  that,  using  sampling  theory  argu¬ 
ments,  Goldberger  and  Theil  [8] [9] [27] [28]  have  obtained 

2 

formulae  similar  to  (7.16),  except  that,  since  a^(6)  (i  =  1,2) 
in  unknown,  they  propose  substituting  various 

reasonable  sample  estimates. 


9 .  Exact  Results 


It  can  be  seen  from  the  above  that  the  credibility 
formulae  presented  here  are  exact  when  the  likelihood  is 
multinormal,  and  the  prior  is  from  a  natural  conjugate  family. 
However,  there  are  additional  cases  in  which  the  credibility 
results  are  exact,  based  upon  the  Koopmans-Pitman-Darmois 
exponential-type  families,  and  their  (suitably  enriched) 
natural  conjugate  priors.  (See  [12] [13] [14]  for  exact  results 
for  the  model  of  (3.11).)  -  These  will  be  reported  in  a 
separate  paper. 


10.  Extensions 


Many  of  the  topics  which  are  considered  as  extensions  in 
classical  works  on  regression  are  already  covered  by  our  basic 
model,  since  no  special  assumption  about  the  error  covariance 
matrix  2(6)  has  been  made;  for  example,  error  terms  may  be 
autocorrelated.  Multivariate  regression  models  are  already 
"serially"  included,  and  it  remains  only  to  translate  them 
into  the  usual  "parallel"  notation.  And,  by  following  the 
discussion  in  Section  6,  a  variety  of  random  input  models  may 
be  elaborated;  for  example,  successive  inputs  may  follow  a 
"random  shocks"  process  [15] . 

There  are  many  interesting  regression  models  in  which  the 
design  matrix  is  not  of  full  rank.  In  these  cases,  (4.2)  and 
(4.4)  are  still  viable,  even  though  the  classical  estimators 
do  not  exist.  Or  one  may  add  additional  constraints,  based 
upon  external  considerations,  until  the  problem  is  "identifiable," 
in  the  classical  sense.  The  particular  problem  of  estimating 
flows  in  a  network  will  be  the  topic  of  a  future  report. 

For  a  simple  linear  regression,  one  can  also  talk  about 
problems  of  inverse  regression;  that  is,  given  y,  what  was 
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the  input  x?  These  questions  arise  in  various  problems  of 
measurement,  and  a  detailed  study  of  instrument  calibration 
and  measurement  using  credibility  methods  may  be  found  in  [18] . 
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Bayesian  Inverse  Regression  and  Discrimination: 


An  Application  of  Credibility  Theory 


R.  Avenhaus  and  W.S.  Jewell 


Abstract 


Many  measurement  problems  can  be  formulated  as  follows: 
a  certain  linear  relationship  between  two  variables  is  to  be 
estimated  by  using  pairs  of  input  and  output  data;  the  value 
of  an  unknown  input  variable  is  then  estimated,  given  an 
observation  of  the  corresponding  output  variable.  This 
problem  is  often  referred  to  as  inverse  regression  or  dis¬ 
crimination  . 

In  this  paper,  we  formulate  a  general  Bayesian  calibra¬ 
tion  and  measurement  model  for  this  problem,  in  which  prior 
information  is  assumed  to  be  available  on  the  relationship 
parameters,  the  possible  values  of  the  unknown  input,  and 
the  output  observation  error.  Simplified  and  easily  inter¬ 
preted  formulae  for  estimating  the  posterior  mean  and 
variance  of  the  input  are  then  developed  using  the  methods 
of  credibility  theory,  a  linearized  Bayesian  analysis 
developed  originally  for  insurance  estimation  problems.  A 
numerical  example  of  the  calibration  of  a  calorimeter  to 
measure  nuclear  material  is  given. 


1 .  Problem  Formulation 

In  this  paper,  we  consider  problems  of  the  following  kind: 
we  wish  to  estimate  the  value  of  a  certain  state  variable  x 
which  cannot  be  measured  directly,  or  only  with  very  large  error 
or  effort.  We  know,  however,  of  another  state  variable  y,  which 
is  statistically  dependent  on  x,  and  which  can  be  measured  more 
easily  or  accurately.  Thus,  in  principle,  we  can  estimate  the 
relationship  between  x  and  y,  and  then,  with  small  effort,  ob¬ 
tain  X  by  measuring  y  and  using  the  inverse  relationship. 

However,  difficulty  arises  because  we  must  use  other  pairs. 
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(Xi/Yi)  (i  =  l,2,...,n),  to  estimate  the  relationship.  Often 
these  will  have  been  determined  for  other  objectives  and  under 
different  experimental  conditions.  Thus,  the  true  values  of 
independent  and  dependent  variables  may  not  be  precisely  known, 
or  the  relationship  itself  may  be  slightly  different  than  it 
appears  from  the  data. 

Finally,  as  in  most  physical  problems,  we  assume  that  a 
great  deal  of  collateral  information  is  available  which  gives 
us  some  prior  idea  of  relationship  between  x  and  y,  and  even 
of  the  unknown  value  x  we  are  trying  to  estimate.  In  other 
words,  we  wish  to  make  a  Bayesian  formulation  of  the  problem. 

Three  examples  of  this  class  of  problem  are  given  below. 

A.  Calibration  and  Indirect  Measurement  of  Nuclear 
Materials 

Nuclear  materials,  e.g.  plutonium,  are  extremely 
difficult  to  measure  directly  by  chemical  means. 
Therefore,  one  uses  indirect  methods,  based  upon  the 
heat  production  or  the  number  of  neutrons  emitted,  in 
order  to  estimate  the  amount  of  material  present.  From 
well--known  physical  laws,  we  have  a  general  relationship 
between  these  variables,  but  any  measurement  instrument 
based  on  these  principles  needs  first  to  be  calibrated. 
Usually,  this  calibration  can  be  done  with  the  aid  of 
standard  inputs ,  containing  known  amounts  of  nuclear 
materials.  However,  these  inputs  {x^)  are  not  generally 
under  our  control,  and  in  some  cases,  may  have  residual 
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imprecisions  in  their  values. 

Measurement  instriiments  often  have  longer-term 
drifts,  during  which  they  tend  to  loose  their  original 
calibration.  For  this  reason,  measurement  of  a  given 
production  run  often  consists  of  two  distinct  phases: 

(re) calibration  of  the  instrument,  and  actual  indirect 
measurement.  With  a  fixed  amount  of  time  available,  it 
is  of  interest  to  determine  how  much  time  should  be 
spent  on  the  two  phases ,  assuming  that  additional  time 
spent  on  each  observation  reduces  observational  error. 

B .  Estimation  of  Family  Incomes  by  Polling 

We  wish  to  estimate,  through  a  public  opinion  poll, 
the  distribution  of  family  incomes  in  a  certain  city 
district.  As  the  major  part  of  the  population  will  not 
be  willing  to  divulge  their  incomes,  or  will  give  only 
a  very  imprecise  figure,  we  look  for  a  dependent  vari¬ 
able  which  can  be  more  easily  determined.  According  to 
the  literature  (see,  e.g.  [lO] ) ,  housing  expenses  are 
strongly  related  to  family  income,  and,  furthermore, 
it  may  be  assumed  that  the  population  is  less  reluctant 
to  divulge  this  figure,  even  though  they  may  not  be 
able  to  do  so  precisely.  Clearly,  to  determine  this 
relationship  exactly,  we  must  have  some  families  in 
this  district  who  are  willing  to  give  both  their  total 
income  and  their  household  expenses.  On  the  other  hand, 
we  have  strong  prior  information  on  this  relationship 
from  similar  surveys,  and  may  have  general  information 
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on  income  distribution  from  census  and  other  sources. 

C.  Missing  Variables  in  Bayesian  Regression 

In  a  paper  with  this  title  [ll] ,  Press  and  Scott 
consider  a  simple  linear  regression  problem  in  which 
certain  of  the  independent  variables,  ,  are  assumed 
to  be  missing  in  a  nonsystematic  way  from  the  data  pairs 
(Xi,yi).  Then,  under  special  assumptions  about  the 
error  and  prior  distributions ,  they  show  that  an  optimal 
procedure  for  estimating  the  linear  parameters  is  to 
first  estimate  the  missing  x^  from  an  inverse  regression 
based  only  on  the  complete  data  pairs. 

Problems  of  this  kind  are  described  in  textbooks  on  the 
theory  of  measurements,  and  are  sometimes  called  discrimination 
problems  (Brownlee  [l]  ,  Miller  [9j  )  . 

In  the  following,  we  shall  formulate  these  problems  as 
Bayesian  calibration  and  measurement  problems,  in  the  sense  of 
Dunsmore  [3]  [4] ,  Hoadley  [s] ,  and  Bindley  [s] .  This  formulation 

is  quite  general,  and  although  the  language  corresponds  to  that 
of  example  A,  the  translation  to  other  examples  is  easily  made. 

Because  of  the  strong  distributional  specification  require¬ 
ments  of  the  full  Bayesian  analysis,  we  shall  then  use  the 
approach  of  credibility  theory  to  find  best  linear  approxima¬ 
tions  to  moments  of  interest.  The  resulting  formulae  enable 
us  to  easily  display  the  relative  value  of  prior  information, 
on  the  one  hand,  and  information  obtained  in  the  calibration, 
on  the  other.  We  will  develop  further  the  optimization  problem 
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described  in  Example  A  above,  and  will  consider  a  numerical 
example  of  calibration  and  indirect  measurement  of  nuclear 
material . 

2 .  Bayesian  Calibration  and  Measurement  Model 

To  develop  the  Bayesian  model,  we  suppose  that: 

(1)  Calibration  consists  of  n  independent  pairs  of  input 

and  output  observations  (x,y)  =  i  =  l,2,...,n).  (x^^ 

is  a  relatively  precise  or  standard  input,  and  y^  is  the 
observed  output  on  a  measurement  instrument ,  which  specifies 

a  statistical  relationship  between  these  pairs  through  a  con¬ 
ditional  measurement  density  ,  p  (y^^  |  ,  6  )  ;  the  measurement 

density  depends  upon  a  fixed  but  unknown  measurement  parameter 

'k 

0,  for  which  we  have  a  prior  density,  p(9)); 

(2)  Measurement  consists  of  using  the  same  instrument  on 
a  sample  of  unknown  input,  x  =  x,  to  obtain  an  output  y  =  y, 
say;  the  problem  is  then  to  infer  the  value  of  x.  Since  this 
cannot  be  accomplished,  we  must,  in  general,  settle  for  an 
estimate,  k,  which,  in  the  remainder  of  the  paper,  we  will 
assume  to  be  (^'{x  |  y  ;x,y}  .  Other  Bayes  estimators  may  be  important 
in  other  physical  situations. 

Following  [S] ,  we  see  that  we  must  compute  the  posterior 
conditional  density, 

k 

We  use  the  convention  that  the  arguments  of  any  p ( • )  indicate 
the  particular  density  in  question,  which  may  be  with  respect 
to  Lebesgue  or  discrete  measure.  Where  necessary,  we  indicate 
a  random  variable  with  a  tilde;  i.e.,  x  is  the  random  variable 
corresponding  to  x,  etc. . 
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p(x|y;x,x) 


P(x,y;Zl2£) 

p(y;z|x) 


/p(yfi.|x,x,0)  p(e|x,x)  p(x|x,e)  de 

fp  (x'  /y  ,y|x)  dx' 


(2.1) 


from  which  the  mean,  (^{x|y;x,y},  will  be  our  estimate  of  the 
unknown  input,  and  the  variance,  'X'lx  |  y  ;x,y}  ,  will  be  the  norm 
for  our  optimization  problem,  since  we  wish  to  make  the  estimate 
as  precise  as  possible  in  the  least-squares  sense. 

To  proceed  further,  we  must  make  additional  statistical 
assumptions  appropriate  to  our  problem: 

(1)  Given  0,  we  assume  that  the  measurements  are  indepen¬ 
dent  : 

n 

p(y,ylx,X,0),  =  p(y|x,0)  n  p(yjx.  ,0) 

i=l 

(2)  We  assume  that  the  prior  on  the  measurement  parameter 
is  unrelated  to  any  of  the  inputs: 

p(0  |x,x)  =  p(,0)  ; 

(3)  Any  unknown  input  in  the  measurement  process,  x,  is 
selected  independently  from  the  standard  inputs,  x  =  [xj^  /  X2  » •  •  •  f  xj  ' 
and  the  parameter  0 ; 

p (x|x,0)  =  p (x) 

The  third  assumption  is  the  strongest,  and  may  not  hold,  for 
example,  when  the  calibration  inputs  and  the  test  input  come 
from  the  same  production  process.  However,  in  our  case,  we 
assume  that  the  calibration  inputs  are  independent  standards. 
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By  elementary  manipulations,  we  obtain: 


P(x)  /p{y|x,e)  p  (elx/Z)  <^6 

p(x|y;x.y)  =  - 

/p  (y  I  0  * )  p  (0  '  I  x,y;)  d  0  ' 


(2.2) 


where 


p  (0  1  x,y) 


n  P  (y.-  ix  ,0)  p  (0) 
i=l 

n  P  (y  ■ [x  ,0  ' )  P  (0  ' )  d0  ' 

Jj=i  :i 


(2.3) 


Notice  that  the  denominators  of  (2.2)  and  (2.3)  are  just 
normalizations,  which  may  be  computed  directly  at  any  time. 

In  the  above  form,  it  is  clear  that  the  problem  breaks 
apart  mathematically  into  two  problems: 

(1)  The  updating  of  p(0)  to  p(0lx,y;)  (calibration)  ; 

(2)  The  calculation  of  moments  of  interest  for  p(x|y,0), 
averaged  over  the  appropriate  density  of  0  measure¬ 
ment  . 

We  tackle  these  problems  in  reverse  order,  since  the  only  effect 
of  calibration  is  to  modify  the  prior  information  about  the 
regression  parameters  and  to  improve  the  precision  of  this 
estimate . 


3.  Estimation  of  Input  Using  Credibility  Theory 

To  find  the  moments  of  p (x | y , 0 )  =  p (y ] x , 0 )  p  (x) //p (y | x ' , 0 ) 
p(x')  dx'  ,  we  must  in  the  general  case  make  distributional  as¬ 
sumptions  about  p(x)  and  p(ylx,0).  However,  since  only  the 
moments  of  this  density  are  of  interest,  it  is  desirable  to 
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have  a  simpler,  distribution-free  approach,  such  as  that  pro¬ 
vided  by  credibility  theory  [6]  [7] .  In  this  approach,  Bayesian 
means  conditional  on  given  data  w,  say,  are  approximated  by 
linear  combinations  of  certain  functions  of  w,  chosen  from 
physical  considerations;  the  coefficients  are  then  chosen  to 
minimize  the  mean-square  approximation  error  prior  to  w.  In 
certain  cases,  these  approximation  formulae  are  also  the  exact 
Bayesian  conditional  means  [6] . 

The  usual  assumption  about  a  measurement  process  is  that, 
given  the  measurement  parameter  6 ,  there  is  a  linear  relation 
between  the  true  input  and  the  true  output ,  but  that  the  ob¬ 
served  process  may  contain  an  additional  uncorrelated  measure¬ 
ment  observation  error,  with  zero  mean  and  known  variance. 

This  may  be  conveniently  expressed  as; 


#{ylx,0}  =  8^(6)  +  82  (0)  X  ; 

(3.1) 

X{ylx,0}  =  . 

(3.2) 

(In  other  applications ,  the  observation  error  may  also  depend 
upon  6  or  the  level  of  x.)  We  call  82(6)  instrument 

parameters . 

We  know  that,  for  general  ptx,yl6) ,  the  fact  that  the 
regression  of  y  upon  x  (3.1)  is  linear  does  not  necessarily 
mean  that  the  regression  of  x  upon  y  is  linear  in  y.  However, 
it  is  true  in  the  case  of  the  normal  and  some  other  bivariate 
distributions,  and  seems  a  desirable  characteristic  of  any 
measurement  process.  Therefore,  we  shall  assxime  that  our  prior 
estimate  of  the  true  input  x,  given  an  observed  output  y,  may 
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be  approximated  by  the  linear  function: 

(f{x|y}  =  **  f(y)  =20*^  ^1^  ' 

where  the  "credibility  coefficients"  z^,  are  chosen  so  as 
to  minimize  the  approximation  error  variance: 

=  <#{x|y}  -  f(y)]^  .  (3.4) 

For  the  remainder  of  this  section,  we  shall  treat  the  averaging 
over  0  as  if  it  were  with  respect  to  the  prior  p(0),  realizing 
that  in  the  next  section  we  shall  change  to  p(0|x,y),  to  add  the 
information  provided  by  the  calibration. 

One  can  easily  show  [6,7]  [2,  Appendix  3]  that  the  optimal 
credibility  coefficients  are  given  by: 


=  ^{x}  -  <^{y}  ; 

(3.5) 

11 

(3.6) 

r{y} 


so  that  the  optimal  estimator  is  unbiased. 

<f{x}  represents  our  prior  estimate  of  the  value  of  the 
input  to  be  measured;  the  remaining  moments  must  be  calculated 
from  our  measurement  assumptions  (3.1)  (3.2).  From  (3.1): 


S’iy)  =  h2(^{x}  » 

where 

b^  =  (i  =  1,2) 


(3.7) 


(3.8) 


are  the  mean  prior  estimates  of  the  instrument  parameters. 
By  unconditioning  (3.2)  on  x  and  0,  we  find: 
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<X'{y}  =  +  0^{x}  I  +  A22)  +  ' 

where 

A  .  =  (6)  ;  3.  (6) }  (i,j  =  1,2)  (3.10) 

J  ■*"  J 

are  the  prior  estimates  of  the  (co) variances  in  the  instrument 
parameters.  We  see  that  the  total  prior-to-measurement  var¬ 
iance  in  the  observation  is  composed  of  three  groups  of  terms: 


(1)  The  observation  error  variance; 

(2)  The  prior  variation  in  input; 

(3)  (Co) variances  in  instrument  parameters. 


An  increase  in  any  one  of  these  will  reduce  the  weight,  z^, 
attached  to  the  observed  output,  y,  in  (3.3). 

There  is  only  one  prior  source  of  covariance  between  input 
and  output : 


f^f{y;x}  =  h^{x}  ,  (3.11) 

which  means  that,  as  the  uncertainty  in  the  input  increases, 
one  must  attach  more  importance  to  the  observed  output  in  (3.3). 

For  convenience,  we  reproduce  the  final  formula  for  the 
estimate  of  the  true  input: 


f(y)  =  <^{x}  +  z^(y-<^{y})  =  (1  -  b2  z^)  .^{x}  +  z^(y-b^) 


Zn  = 


b2<nx} 


'1  2 


+  <r{x}  (b2  +  A22)  +  +  2Aj^2^^^^  +  ^22!^^^^^^ 


.  (3.13) 


Thus,  in  the  credibility  approach,  only  seven  prior  moments  must 
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be  specified:  the  mean  and  variance  of  the  potential  input, 
and  the  two  means  and  three  (co) variances  of  the  instrument 
coefficients . 

It  is  of  interest  to  examine  several  limiting  cases  of 

the  estimator  (3.12)  (3.13)  in  more  detail.  First,  as  already 

2 

mentioned,  if  either  the  observation  error  variance  or  any 
of  the  instrument  variances  is  very  large  (sometimes  called  a 
"diffuse"  calibration  prior),  then,  since  vanishes,  the  best 
estimate  of  x  is  its  prior  mean,  ^{x};  the  measurement  process 
gives  little  additional  information.  Similarly,  the  vanishing 
of 'y{x}  makes  S‘{x}  very  reliable. 

On  the  other  hand,  suppose  that  we  have  a  "diffuse"  prior 
on  the  level  of  input,  that  is,  although  ^{x}  is  given,  <X{x}^«>. 
In  this  case  the  forecast  can  be  rewritten: 


f  (y) 


1+ 


-1 


-  (y/b2)  .  (3.14) 


If  A  /b~  is  small  compared  with  unity,  we  obtain  exactly  the 
22  2 

deterministic  result  corresponding  to  (3.11),  y  =  b^ + b2 x  . 

In  the  optimization  model  of  Section  6,  we  shall  need  the 
mean-square  value  of  the  error  between  the  true  value  x  and  the 
predictor  f (y) ,  that  is,  the  variance  of  forecast  error: 


H  =  ^{  (X  -  f  (y)  }  . 


(3.15) 


But,  by  elementary  manipulations. 


H  =  Hq  +  Ha  , 


(3.16) 
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where  Hq  is  the  irreducible  forecast  variance  using  the  Bayesian 
conditional  mean". 

Hq  =  (x  -  ^{x|  y}  )  ^  I  y}  =#<X'{xly}  ,  (3.17) 

and  is  given  by  (3.4). 

With  the  optimal  choice  of  credibility  coefficients,  we  obtain: 

H  =  Y{x}  -  z^<<f{y;x}  =  <y{x}  (1  -  ^^^2^  *  (3.18) 

H  in  (3.15)  and  (3.18)  is  the  variance  of  forecast  error  for 
one  inverse  measurement.  If  r  such  measurements  are  performed, 
with  independent,  identically  distributed  inputs,  then  one  can 
easily  show  that  the  variance  of  the  total  error  will  be: 

H  =  tnx}  (1  -  z^h^) 

+  (r^  -  r)  ^22  ^  ^ 

(3.19) 

We  see  that,  in  addition  to  the  expected  first  term  which  is 

r  times  (3.18),  there  is  a  component  which  is  proportional 
2 

to  r  .  This  represents  a  possible  persistence  of  error  due 
to  instrument  parameter  covariances,  which  may  cause  the  in¬ 
dividual  forecast  errors  to  be  positively  correlated. 
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4 .  Updating  of  Instrument  Parameters  Using  Credibility  Theory 

We  turn  now  to  the  problem  of  incorporating  the  results 
of  the  calibration  experiments  into  our  prior-to-measurement 
density  on  0.  Remember  that  the  number,  n,  of  such  experiments, 
and  the  previously  calibrated  levels  of  the  inputs, 
x^ {i=l, 2 , . . . ,n) ,  are  assumed  to  be  fixed  by  external  considerat¬ 
ions.  See  also  Section  6  below. 

Assuming  that  (3.1)  and  (3.2)  apply  also  to  calibration 
(i.e.  the  same  instrument  is  used) ,  we  may  write: 


IX 

CD 

11 

X  g(0)  , 

(4.1) 

^{y;y lx,0} 

=  I 

C  n 

(*)  , 

(4.2) 

where 

Y  =  [yi'y2' *  •  • 'YJ'  '  2  ^2, .  .  .  ,xj  '  , 

B(0)  =  [61(e)  ,^2  (00  '  f  X=  [ij^,x]  , 

1  is  a  vector  of  n  ones,  i  is  the  unit  matrix  of  order  n, 

— n  n 

and  is  the  observation  variance  for  each  output  y^  (i=l , 2 , . . . ,n) . 
We  thus  have  a  formulation  as  a  Bayesian  regression  problem,  in 
which  we  want  to  estimate  various  moments  of  p(g(0)|x,y).  In 
particular,  from  ( 3. 8)  ( 3 . 10)  (3 . 13)  (3 . 18) ,  we  see  that  the  first 
and  second  moments; 

^{6(0)  lx, y}  ;  (0  )  ;g  (0  )  1x,y} 

will  be  needed. 

(*)  Vector  covariance  is  defined  as 

'^{w;f}  =  <#’{w  z '  }  -  [#{z}]  ' 

for  any  two  random  vectors  w  and  z. 
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Rather  than  make  distributional  assxamptions ,  such  as  those 
followed  in  Ql3]]  ,  we  shall  again  make  a  credibility  approximation, 
this  time  to  (^{^(0)  1  x,y}.  The  appropriate  theory  has  been 

developed  in  ,  and  we  shall  give  only  the  necessary  results 
here. 

First,  we  approximate  the  desired  mean  instrument  parameter 
vector  by  a  linear  function  of  the  data  vector  y: 

I  x,y}  *  g(Y)  =  Zq  +  Zy  f  (4.3) 

where  g,  Zq  are  two-vectors,  Z  is  a  2  •  n  matrix,  and  the  cred¬ 
ibility  coefficients  are  chosen  so  as  to  minimize  the  mean-square 
approximation  of  both  components  to  those  of  the  Bayesian  condit¬ 
ional  mean  vector.  After  some  algebra  it  is  shown  in  [[7]  that 
the  optimal  credibility  forecast  can  be  written  as; 

g(Y)  =  (I2  -  z)b  +  z  3(y)  f  (4.4) 

where  b  =  Cbj^,b22'  is  the  vector  of  prior-to-calibration  means, 
z  is  a  new  2  •  2  oved.-ib'il'Cty  matrix'. 

z  =  [A(X'e"^X)]  [I2  +  A(X'e"^X)2"^  (4.5) 

(the  terms  in  square  brackets  commute),  and  §(v)  is  the  class¬ 
ical  regression  estimator  of  g  : 

3(y)  =  (X'e“^X)"^  X'e“^Y  • 

A  is  the  2  •  2  matrix  of  prior-to  calibration  covariances 
defined  in  (3.10),  and 
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E 


y1  @  ^ 


I 


n 


(4.7) 


Thus,  in  our  model,  the  "regression  errors"  are  "homoscedastic" , 
and  we  get  the  further  simplifications: 


z  =  [AX'X^Qj^  I2  +  AX'xJ  ^ 


(4.8) 


and 


|(y)  =  (X'X)“^X'y 


(4.9) 


w'here 


1 

n 

I  Xi/n 

i=l  ^ 

1 

mi 

XX'  =  nM  =  n 

n 

I  x./n 

_i=l 

I  xf/n 

i=l 

=  n 

mi 

m2 

(4.10) 


i.e.  n  times  a  matrix  of  deterministic  moments  m^ ,  m2  describing 
the  predetermined  calibration  inputs.  One  may  easily  verify  that: 

-m^^  1 

The  results  (4.4)  (4.8)  (4.9)  are  intuitively  very  satis¬ 

fying,  for  they  show  that  our  estimate  of  the  instrument  co¬ 
efficients  prior  to  calibration  should  be  taken  as  a  linear  mixture 
of  our  prior  hypothesis,  b,  and  of  the  well-known  classical  esti¬ 
mator,  B(y).  The  credibility  attached  to  the  latter  depends  upon 
the  so-called  design  matrix,  X,  the  observation  error  variance, 

0^,  and  the  instrument  covariances,  A.  (See  Jewell  ) . 
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Several  limiting  cases  are  of  interest.  First,  as  our 

observation  error  variance  gets  very  large,  z  vanishes,  and 

no  credibility  is  attached  to  the  calibration  experiment 

it  is  better  to  stick  with  the  prior  estimates. 

Conversely,  if  all  the  prior  instrument  covariances,  , 

get  very  large,  then  z  ^2'  "full  credibility"  is  attached 

2 

to  the  calibration  data;  the  same  result  occurs  as  0. 

Note  also  that  full  credibility  occurs  as  the  length  of  the 
calibration  run,  n,  increases,  as  long  as  the  successive  inputs 
are  chosen  in  such  a  way  as  to  keep  m^^  and  m2  about  the  same; 
in  other  words,  the  more  calibration,  the  more  weight  is  attached 
to  the  results. 

The  above  model  may  be  easily  generalized  to  the  case  where 
the  standard  inputs  themselves  are  subject  to  errors.  In  this 
case,  we  suppose  that  the  selection  of  a  "target  input"  i  specif¬ 
ies  rather  than  x^^;  the  actual  input  differs  from  the 

mean  by  a  known  variance,  T’{Xj^}.  The  reader  may  easily  verify 
that  the  above  formulae  again  apply,  with  X  =  Q.^^,  ^{x}]  and  with 
j.7)  replaced  by  a  new  diagonal  matrix,  with  terms: 

+  (b2  +  A22)  (i=l,2, .  .  .  ,n)  ,  (4.11) 

in  the  general  case,  the  formulae  (4.5)  (4.6)  must  now  be  used; 

however,  if  the  precision  of  the  standards  is  the  same,  the 
eqresLiion  is  again  homeoscedastic ,  and  (4.8)  (4.9)  may  be  used, 

out  with  replaced  by  (4.11). 

As  far  as  the  mean-square  error  in  fitting  g(0)  by  (4.4) 
is  concerned,  we  can  also  show  that  the  prior  covariance  matrix. 
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with  optimal  choice  of  credibility  coefficients,  is: 

(()(X)  =  #{(g(e)  -  g(y))(§(e)  -  g(y))'lx} 

_  _  (4.12) 

=  (I2  -  z)A  =  z(X'E  X)  . 

If  this  fit  is  good,  then  (j)^^  will  be  a  good  approximation  to 

;  3j(6)}  after  the  calibration,  at  least  as  we  perceive 
it  to  be  before  we  actually  obtain  the  outputs  y.  In  other 
words,  (j)(X)  is  our  preposter-tor  estimate  of  the  covariance 
between  instrument  parameters . 

It  should  be  remembered  that  only  the  diagonal  terms  of 
(4.12)  were  individually  optimized  in  the  choice  of  credibility 
coefficients;  one  can  easily  show  that  the  diagonal  elements 
of  (J)  (X)  are  less  than  those  of  A. 

5 .  Integration  of  the  Calibration  and  Measurement  Stages 

We  may  now  complete  our  arguments  about  the  relationship 
between  Sections  3  and  4,  in  light  of  the  knowledge  available 
at  each  stage  of  the  physical  problem. 

First,  with  only  a  prior  hypothesis  about  our  instrument 
available,  and  no  calibration  contemplated,  our  best  estimate 
of  g(0)  is  b,  with  covariance  A.  If  an  inverse  measurement 
were  to  be  performed  at  this  point,  (3.12)  (3.13)  is  the  formula 

we  would  use  to  estimate  the  true  input,  and  H  in  (3.18)  is  the 
estimate  now  of  the  variance  in  this  estimate. 

Now,  suppose  we  contemplate  performing  a  calibration  exper¬ 
iment  (X,n) ,  with  a  fixed  number  of  standards  and  fixed  input 
design,  but  the  results  of  the  calibration  are  not  yet  available. 
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We  still  have  no  basis  for  revising  ^{3(9)}/  since  the  formula 
(4.4)  is,  prior-to-calibration,  unbiased.  However,  the  know¬ 
ledge  that  there  will  be  a  calibration  will  reduce  our  instrument 
covariance  terms  from  A  to  0 (X) .  Therefore,  prior  to  calibration, 
our  estimate  of  the  forecast  error  variance  after  measurement 
changes  from  (3.18)  to; 


H(X)  =‘>'{x}  - 


-  ~  “I 

>"{x}  ' 


+r{x}  (b2+<})22) 

(5.1) 


(This  is  the  point  at  which  optimization  of  the  next  section  will 
be  carried  out) .  Similar  modification  applies  to  (3.19) . 

We  now  perform  the  calibration  experiment,  obtaining  y  and 
the  revised  estimates,  g(Y)/  ^{^(0)Iy/X}  from  (4.4).  These 

revised  estimates  of  the  instrument  coefficients  are  then  used 
in  (3.12)  and  (3.13),  which  become: 


f(y|iL/X)  =  D  -  g2(y)Zi(Y,x)]  #{x}  +  z3^(Y,x)|;;y  -  gi(Y)‘]  ; 

(5.2) 


92  (y) 

^  |[g2  (y)1  ^  "*■  ^22}  ^  ^11  '1^22^^^^^ 

(5.3) 

This  is  the  final  estimator  for  any  unknown  input,  after  the 
calibration  has  been  performed. 

We  admit  that  it  should,  in  principle,  be  possible  to 
revise  our  estimate  of  the  covariance  of  the  instrument  co¬ 
efficients,  cj),  after  the  actual  calibration  outputs,  y,  are 
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obtained;  however,  these  terms  are  probably  already  small  for 
any  reasonable  calibration  run,  and  to  construct  an  additional 
credibility  approximation  for  the  posterior-to-calibration 
variance  would  require  additional  moments  and  complex  formulae. 
Similarly,  it  should  be  possible  in  principle  to  revise 
our  estimate  of  H(X)  after  the  measurement  y  is  made,  but  this 
leads  to  the  same  additional  complexity.  If  one  wishes,  post¬ 
erior  to  the  calibration, one  can  replace  b2  in  (5.1)  by  g2(x)* 

We  mention  again  some  of  the  limiting  cases  of  (5.2)  (5.3), 
assuming  that  the  revised  instrument  covariances  are  small. 
First,  if  the  observation  error  variance  is  very  large,  or 
the  variance  in  input  is  small,  then  the  credibility  in  (5.3) 
will  be  very  small,  and  the  best  estimate  of  the  input  is  the 
prior  mean.  Conversely,  a  diffuse  input, ‘X  {x}  ->  <»,  will  lead  to 
z^(y,X)  «  (g2(Y))”^'  ^  forecast; 

f(y|y,X)  ^  [j  -  gi(Yn/92  *  (5.4) 

6 .  Optimization 

For  the  optimization,  we  assume  that  there  is  a  total  of 

T  hours  to  be  split  among  n  calibration  measurements,  say  a 

total  of  hours,  and  the  remainder,  =  T  -  hours,  to  be 

spent  upon  r  inverse  inference  measurements.  We  assume  that 

one  hour  spent  on  a  single  measurement  or  calibration  gives  an 

2 

observation  error  variance  of  o  ;  therefore  the  individual 
observation  variances  used  previously  are  then; 


(6.1) 
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To  minimize  the  prior-to-calibration  estimation  of  the 
forecast  variance  of  a  typical  measurement,  we  must  minimize 
the  denominator  of  the  second  term  of  H(X)  in  (5.1): 


2  ~  ^  ^ 
D(Tc,Tj^)  =  +  '>'{x}(b2  +  4)22)  +  ‘f>ii  +  2<^22^[x}  +  4)22!^^^^^ 

2  2 

where  cp  is  given  by  (4.12),  with  replaced  by  na  /T^  in  (4.8), 
subject  to  =  T.  In  general,  this  optimization  must  be 

carried  out  numerically.  However,  if  na  /T^  is  much  smaller 
than  the  diagonal  terms  of  AM,  then  the  calibration  will  have 
practically  full  credibility,  and 

2  2 

4)  =  (I2  -  z)A  «  j^l2  -  (I2  -  ^(X'X)"^)j  =  m“^  .  (6.3) 


This  shows  the  expected  result,  namely,  that  a  good  calibration 
run  gives  vanishing  (})  as  increases.  The  effect  of  the  number 
of  runs,  n,  is  essentially  cancelled  out,  as  long  as  M  is  stable 
over  different  designs. 

With  this  approximation,  (6.2)  can  be  written: 


(6.4) 


where 


y 


m. 


2m^ 


m 


^  {x}  +  (^{x^} 
2 

2  ’^l 


(6.5) 


this  form,  the  optimization  is  obvious — the  total  time  T 
should  be  split: 


T*  /  T*  = 


f 


(6.6) 
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giving  a  minimal  value  for  D  of: 


An  increase  in  the  number  of  production  runs,  r,  decreases  the 
time  used  for  calibration  in  an  interesting  way  (6.6). 

It  is  also  interesting  to  note,  in  this  approximation, 
that  the  ratio  of  effort  depends,  in  addition  to  r,  only  on  the 
first  and  second  moments  of  the  calibration  design  inputs,  and 
on  the  measurement  input.  If  the  design  X  is  considered  to  be 
variable,  we  see  that  we  can  further  minimize  (6.4)  by  decreas¬ 
ing  y ,  i.e.  we  choose  inputs  x  so  that: 

m^  «  #-x}  ;  (m2  -  m^)  is  as  large  as  possible;  (6.8) 

which  is  very  intuitive  from  a  physical  point  of  view. 

This  design  choice  would  make  y  close  to  unity,  and  then 
=  r  Of  course,  there  may  be  many  other  physical 

reasons  why  the  calibration  input  must  be  chosen  in  a  dif¬ 
ferent  manner. 

Even  if  the  approximation  (6.3)  does  not  hold,  (6.6)  is  suggested 
as  an  initial  trial  solution. 
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7 .  Numerical  Example;  Calorimetric  Measurement  of  Nuclear 
Material 

In  order  to  illustrate  the  models  developed  in  previous 
sections  we  use  three  kinds  of  information; 

(.1)  a-priori  information  on  the  relationship  between 
dependent  and  independent  variable; 

(2)  results  of  calibration; 

(3)  results  of  measurement  of  the  dependent  variable. 

The  following  realistic  example  will  illustrate  circum¬ 
stances  under  which  certain  information  is  more  important ,  and 
the  improvement  is  achieved  by  using  credibility  procedures. 

Let  us  consider  the  quantitative  measurement  of  plutonium 
with  the  help  of  a  calorimeter.  The  problem  is  to  measure 
a  voltage  induced  by  the  heat  produced  by  the  plutonium.  For 
this  purpose,  one  has  to  know  the  isotopic  composition  of  the 
plutonium  to  be  measured  as  well  as  the  specific  heat  production 
of  the  different  isotopes.  Typical  data  are  given  in  Table  1. 

Let  the  amount  of  plutonium  of  one  batch  to  be  measured, 
and  let  w  be  the  specific  heat  production  of  the  plutonium 
under  consideration.  Then  the  heat  x  produced  by  the  amount 
P  of  plutonium  is  given  by 

X  =  w  *  P  .  (7.1) 

The  voltage  E,^  induced  in  the  measurement  chamber  of  the 
calorimeter  is  proportional  to  this  heat; 

E,,  =  a  •  X  =  a  •  (wP)  .  (7.2) 

M 
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In  a  second,  identical  chamber,  a  reference  heat  is  gener¬ 
ated  which  induces  a  voltage  Eq.  Because  of  the  assumed  sym¬ 
metry  of  the  chambers,  we  have 

Eg  =  a  -  Xg  .  (7.3) 

The  value  of  x^  is  kept  constant  throughout  the  operation  of 
the  instrument.  The  quantity  actually  measured  is  the  differ¬ 
ential  voltage  y. 


or ,  in  other  words , 

y  =  +  32  •  ' 

where 


(7.4) 


(7.5a) 


^1  ^  ^  '  ^2  ^  '  a  >  O  .  (7.5b) 

The  value  of  x^  may  be  assumed  to  be  known  precisely.  In 
addition,  we  assume  there  exists  experience  from  past  measure¬ 
ments,  expressed  as  expectation  and  variance  of  a,  now  considered 
as  a  random  variable.  This  means  we  know 

b^  =  <^{a}xQ  ;  b2  =  -<^{a}  ;  (7.6a) 


A  = 


<r{3i}  ‘<f{ei:32}) 


{33^:32>  <^{32^ 


r-{a} 


(7.6b) 


The  calibration  is  performed  by  putting  an  electric  heater  into 
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the  measurement  chamber  and  generating  different  values 
of  heat  which  generates  corresponding  differential  voltages 

AEi  =  '  ^±2  '  i  =  l,..-,n  .  (7.7) 

Typical  data  for  such  a  measurement  problem  are  given  in 
Table  2 .  According  to  this  table ,  we  have 

b^  =  600  [mV]  ,  (7.8a) 

b2  =  “240  [mV/Watt]  ;  (7.8b) 

and  furthermore, 

T{a}  =  (.02)^*  [^(a)]^  =  2i.04  *  [mV^/Watt^J  .  (7 . 8c) 

In  addition,  we  have 

^{x}  =  2.668,  ^{x}  =  .07118,  <#{x^}  =  7.189  .  (7.9) 


Therefore,  we  get  for  ,  as  defined  by  (3.10)  and  given  by 
(7.6)  , 


'6.25 


-2.5' 


144 


-57.6 


23.04 


(7.10) 


^-2.5 


^-57.6  23.04 


Let  us  consider  first  the  case  that  we  do  not  perform  any 
calibration,  but  use  only  the  prior  information  given  by 
equations  (7.8)  and  (7.9).  According  to  (3.12)  the  estimate 
of  the  heat  production  is  given  by 
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f  (y)  =  S‘{x}  +  z^(y  -  ^{y}) 


=  2.48*10 


y  -  600 
^2  +  0.2234  ' 


(7.11) 


which  is  to  a  good  approximation 

f(y)  vf-  (y  -  600)  . 

2 

We  can  easily  determine  the  preposterior  improvement  in  pre¬ 
cision  if  we  use  (7.11)  instead  of  simply  using  ^{x};  if  we 
take  <#’{x},  then  the  variance  of  this  estimate  is 

Hq  =  <X'{x}  =  .07118  . 

Now,  according  to  (3.18)  we  get  for  the  variance  of  the  fore¬ 
cast  error  of  a  single  measurement 

H  =  r{x}  •  (1  -  z^*b2) 

=  <r{x}  •  9. 31  •  lO"^ 

«  10“^-'r{x} 

/ 

and  according  to  (3.19),  for  the  variance  of  the  forecast  error 
of  the  sum  of  r  measurements 

=  r-Xlx}  (1  -  Zj|^*b2)  +  (r^  -  r )  •  z^*  +  2Aj^2 ' 

+  A22  ^) 

=4.3*  10~^  +  4  ♦  10"^ 

*  4.4  •  10“^ 

which  shows  that  this  variance  is  mainly  determined  by  the 
uncertainty  of  the  instrument  parameters,  which  is  common  to 
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all  measurements. 

Let  us  now  use  the  calibration  given  in  Table  2 .  With 


we  have 


/  1  1.85  \ 

X'X=8  =8*M.  (7.13) 

\l.85  3.845/ 

We  can  use  the  approximate  formula  (6.6)  for  the  optimal  dis- 

2 

tribution  of  calibration  and  measurement  effort,  if  n  ♦  a  /T^ 
is  much  smaller  than  the  diagonal  terms  of  A  •  M.  We  check  this 
assumption  by  first  using  equation  (6.6)  and  then  seeing  whether 
or  not  the  result  fulfills  the  assiimption. 

According  to  equation  (6.6)  and  Table  2  the  optimal 
distribution  of  the  time  T  available  is  given  by 
T* 

=  214  T*  +  T*  =  720 

T*  '  M  ' 

M 

or ,  in  other  words , 

T*  =  127  ,  T*  =  593  .  (7.14) 

c  M 

Therefore ,  we  have 


(7.15) 
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which  means  that  our  assumptions  are  fulfilled. 

Finally,  we  want  to  determine  the  improvement  in 
precision  by  using  the  calibration.  According  to  equation 
(4.12)  we  have 

(|)(X)  =  {I2  -  z)  •  A  , 

where  ^  is  given  by  (4.8).  With  (7.10),  (7.13),  and  (7.15) 

we  obtain 

/  5.96  12.54 

-  ° 

\-2.36  -4.94 


(7.16) 


Even  though  the  forecast  error  variance  after  calibration  and 
measurement  according  to  (5.1)  can  be  determined  only  if  the 
calibration  data  i  =  l,...,n.  are  available,  a  com¬ 

parison  of  (7.16)  and  (7.10)  shows  that  the  use  of  the  cali¬ 
bration  represents  a  considerable  improvement  in  precision. 
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Table  1:  Typical  Plutonium  Mixture 

(Source:  Schneider  et  al.  [l2]  ) 


Pu238 

Pu2  39 

Pu240 

Pu241 

Pu2  42 

Ain241 

Mean 

concentration 

[%] 

0.041 

90.51 

8.265 

1.113 

0.064 

0.05 

Specific 

heat  flux 
[mW/g] 

569.0 

1.923 

7.03 

4.62 

0.12 

108.4 

Contribution 

to  w 
[mW/g] 

0.2333 

1.7405 

0.581 

0.052 

7. 6  9  *10"^ 

0.0612 

Mean  specific  heat  flux  w:  2.668  [mW/g  Pu] 
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Table  2 :  Typical  Measurement  Problem 

(Source:  Schneider  et  al.  [l2]  ) 


No.  of  batches  r 

Mean  Pu  content  P  [hg]  of  one  batch 

Mean  heat  production  x  =  w  •  P  [w]  of  one  batch 

Batch-to-batch  variation 

2  2 

Variance  of  a  single  measurement  o  (t)  [(mV)  ]  as  a 
function  of  time  t [h]  for  t  >  6 
Total  time  T[h]  available 
Na  of  calibrations  n 

Range  R  of  calibrations  [Watt]  0.8 

Values  x^2  calibration  procedure  0.8,  1. 

A  priori  information  ^^^[mV]  on  intercept 
A  priori  information  the  slope  of 

the  calibration  line 

A  priori  information  on  the  variance  of  3 
(parametrically) 


60 

1 

2.668 

10% 

18.324 

t 

720 

8 

;<  R  £  3.0 

1, ...  ,2.9 

600 

-240 

2%,  5% 
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